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Chapter  1.  INTRODUCTION 


1.1  Background 

In  an  organization,  how  to  partition  a  task  and  assign  subtasks  to  divisions  is  an 
important  problem.  This  topic  is  closely  related  to  the  issue  of  the  ‘organizational 
structure’  and  the  ‘efficiency  of  organizational  performance’.  An  ‘organization’  could 
be  a  human  organization,  interconnected  machines,  or  computer  network.  We  will 
use  the  word  ‘agent’  or  ‘division’  in  order  to  refer  to  a  constituent  of  an  organization 
-  for  example  each  person,  machine,  or  a  processor  of  a  computer  network.  Here, 
what  we  mean  by  organizational  structure  is  purely  a  communicational  structure; 
namely,  who  should  communicate  with  whom.  The  hierarchy  among  agents  is  not 
discussed.  Efficiency  generally  means  how  quickly  an  organization  can  perform  its 
task.  In  order  to  understand  the  issue  clearly,  let  us  consider  the  examples  in  Figure 
1-1,  1-2,  1-3. 

In  the  diagrams  in  these  figures,  the  rectangles  represent  organizational  tasks, 
and  the  circles  represent  agents  of  the  organization.  In  Figure  1-1,  dotted  lines 
connecting  task  1  and  task  2  represent  coupling  between  two  tasks.  This  means  that 
the  decision  making  for  task  1  affects  the  outcome  of  task  2  and/or  vice  versa.  If  tasks 
1,2,3  are  assigned  to  agents  1,2,3,  respectively,  agents  1  and  2  must  communicate. 
Couplings  among  the  divisional  tasks  necessitate  certain  communication  links  among 
the  agents;  therefore,  they  require  a  certain  type  of  organizational  structure.  On  the 
other  hand,  if  the  organizational  structure  is  fixed,  a  task  must  be  partitioned  in  such 
a  way  that  subtasks  assigned  to  disconnected  agents  are  decoupled.  For  example,  if 
we  have  a  fixed  organization  like  Figure  1-2,  subtasks  assigned  to  agents  1,2,3  must 


be  decoupled  from  those  assigned  to  agent,  4,5. 

As  for  the  efficiency  of  organizational  performance,  consider  Figure  1-3.  Suppose 
we  have  three  agents  with  equal  capacity  and  a  global  task  comprised  of  6  mutually 
decoupled,  equal-size  subtasks.  Intuitively,  we  know  that  (b)  is  a  more  efficient 
allocation  than  (c),  because  with  partition  (b)  the  task  will  be  executed  sooner  than 
with  partition  (c). 

One  of  the  obstacles  to  the  discussion  of  this  problem  of  task  partition  is  the 
conceptual  difficulty  of  constructing  mathematical  models  which  precisely  describe 
the  behavior  of  organizations.  A  particular  mathematical  model  has  been  developed 
in  [l]  based  on  a  decentralized  gradient-like  algorithm.  This  algorithm  is  basically  a 
descent-type  optimization  algorithm  for  parallel  processing.  Each  processor  has  a  lo¬ 
cal  cost  function  to  optimize,  and  each  processor  has  its  own  variable  that  it  updates 
at  each  iteration.  Each  local  cost  function  may  depend,  however,  on  the  variables  to 
be  updated  by  other  divisions.  With  each  processor  updating  its  own  variable  and 
communicating  information  with  other  processors,  the  algorithm  achieves  the  goal 
of  optimizing  the  sum  of  local  cost  functions.  Such  a  decentralized  gradient-  like 
algorithm  for  additive  cost  functions  could  describe  the  behavior  of  an  organization 
of  boundedly  rational  agents,  as  they  adjust  their  decisions  toward  the  objective  of 
minimizing  an  organizational  cost  function.  The  minimization  of  the  cost  function  is 
viewed  as  the  organizational  task,  and  the  value  of  each  variable  in  the  cost  function 
represents  a  decision  made  by  the  associated  division.  Alternatively,  the  value  of 
each  variable  can  be  viewed  as  the  mode  of  operation  of  the  associated  division  if 
we  set  the  organizational  objective  to  be  the  optimal  operation,  in  the  sense  that 
the  variables  have  the  values  minimizing  the  cost  function.  An  iterative  minimiza- 
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tion  process  models  the  boundedly  rational  agents,  who  make  tentative  decisions 
based  upon  available  information  of  the  decisions  made  in  other  divisions  and  adjust 
decisions  as  they  gain  additional  information.  Partitioning  the  organizational  task 
mathematically  translates  to  decomposing  the  global  cost  function  into  the  sum  of 
subcost  (or  local  cost)  functions.  The  problem  of  partitioning  an  organizational  task 
will  be  formulated  in  this  mathematical  framework  in  the  next  section. 

1.2  Problem  statement 

It  is  useful  to  specify  the  purposes  of  partitioning  the  task,  in  order  to  formulate 
the  problem  in  a  way  that  is  relevant  to  some  real  world  situations  or  applications 
The  purposes  of  the  partition  are  classified  as  follows: 

1.  Reduction  of  individual  load: 

When  the  task  is  complex  or  the  scale  of  the  task  is  big,  a  single  agent  with  limited 
capacity  simply  cannot  handle  it.  The  agent’s  memory  is  limited.  Its  power  is 
limited.  Actually,  this  is  the  reason  why  organizations  are  formed.  We  can  reduce 
the  load  of  an  agent  by  partitioning  the  task.  In  our  formulation,  the  load  of  each 
agent  will  be  measured  by  the  complexity  of  the  subcost  function  assigned  to  that 
agent.  (The  measure  of  the  complexity  of  the  subcost  function  will  be  precisely 
defined  later.) 

2.  Speed  of  performance: 

Even  if  one  agent  may  be  able  to  handle  the  whole  task,  it  is  often  better  to  have  the 
agents  in  the  organization  share  the  task  so  that  they  finish  the  task  sooner.  The 
speed  of  organizational  performance  may  be  represented  by  the  speed  of  convergence 


of  the  decentralized  algorithm  discussed  earlier. 

3.  Security: 

Under  special  conditions  such  as  defense  projects,  it  is  not  desirable  to  set  the  oper¬ 
ation  under  the  control  of  a  single  agent.  The  ‘decentralized  gradient-like  algorithm 
for  an  additive  cost  function’  is  an  appropriate  model  of  the  organization  serving 
this  purpose.  In  this  algorithm,  a  processor  does  not  know  what  other  processors’ 
subcost  functions  are.  This  means  that  each  agent  keeps  only  a  fraction  of  organiza¬ 
tional  secrets.  We  also  want  to  minimize  the  amount  of  communication  because  an 
organization  can  reduce  the  leakage  of  secrets  by  keeping  the  amount  of  communi¬ 
cation  minimal.  The  amount  of  communication  will  be  defined  in  two  different  ways 
in  order  to  model  certain  different  situations. 

Therefore,  the  task  should  be  partitioned  so  that  these  purposes  are  fulfilled. 
For  convenience  of  exposition,  we  assume  that  the  global  cost  function  is  quadratic, 
and  its  minimization  represents  the  organizational  task.  The  only  thing  that  matters 
is  the  coupling  between  variables  introduced  by  the  cost  function.  Thus,  assuming 
a  quadratic  function  is  no  loss  of  generality.  We  also  assume  that  the  complexity 
of  a  subcost  function  is  measured  by  the  number  of  terms  in  that  subcost  function. 
Mathematically,  our  problem  is  : 

Given  a  quadratic  cost  function,  find  a  decomposition  into  the  sum  of  subcosts  such 
that 

Objective  1  : 

The  number  of  terms  in  each  subcost  is  small. 


Objective  2  : 


The  speed  of  convergence  of  the  decentralized  gradient-like  algorithm  is  maximized. 
Objective  3  : 

The  amount  of  required  communication  is  minimized.  (  If  the  subcost  assigned  to 
agent  i  depends  upon  a  decision  variable  determined  by  agent  j,  agent  i  and  agent 
j  are  required  to  communicate  with  each  other  according  to  the  model.  ) 

It  is  conceivable  that  one  may  formulate  a  multi-objective  optimization  problem, 
which  accounts  for  all  of  these  three  objectives.  However,  our  understanding  of  the 
decentralized  gradient-like  algorithm  is  too  restricted  to  solve  it.  In  fact,  analytical 
understanding  of  the  speed  of  convergence  seems  impossible.  Moreover,  these  three 
objectives  may  well  be  conflicting  with  one  another.  Therefore,  we  have  to  deal  with 
each  of  the  three  objectives  separately.  The  following  are  some  feasible  problems. 

Given  a  quadratic  cost  function,  find  a  decomposition  into  the  sum  of  subcost  func¬ 
tions  that 

Problem  1  : 

minimizes  the  number  of  terms  of  the  agent  with  highest  load;  under  the  constraint 
that  the  amount  of  required  communication  is  less  than  a  certain  number  or  under 
a  constraint  that  the  communication  structure  of  an  organization  is  restricted. 

Problem  2  : 

minimizes  the  amount  of  required  communication;  under  the  constraint  that  the 
number  of  terms  (representing  load)  assigned  to  each  processor  is  less  than  a  certain 


integer. 


Problem  3  : 

maximizes  the  speed  of  convergence. 

When  designing  an  organization  with  a  severe  constraint  on  the  communication 
structure,  a  designer  needs  to  compute  the  maximum  load  of  divisions  in  order  to 
figure  out  the  required  capabilities  of  divisions.  Problem  1  provides  a  mathematical 
framework  to  study  this  issue.  Problem  2  deals  with  the  situation  where  the  capabil¬ 
ities  of  each  division  are  already  fixed.  The  issue  is  how  to  allocate  the  tasks  cleverly 
in  order  to  minimize  the  amount  of  communication.  Solutions  to  Problem  2  can  also 
be  used  as  a  measure  of  coupling  of  the  task.  It  shows  how  much  cooperation  it 
takes  to  perform  the  task  when  information  is  distributed. 

More  detailed,  rigorous  formulation  of  these  problems  will  be  presented  in  Chap¬ 
ter  2. 


1.3  Literature  Survey 

There  has  been  little  literature  concerning  mathematical  formulation  of  organi¬ 
zational  behaviors,  especially  task  assignment  strategy.  Reference  [l]  suggests  using 
decentralized  descent-type  algorithms  as  a  model  of  behavior  of  boundedly  ratio¬ 
nal  human  decision  makers.  The  mathematical  framework  of  [l]  will  be  adopted 
for  our  research.  Reference  [2]  discusses  a  design  method  for  certain  classes  of  hu¬ 
man  organizations.  It  does  not  develop  a  specific  mathematical  model  for  human 
organizations.  Rather,  it  suggests  a  general  design  method  that  applies  to  some 
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human  organizations  that  already  have  tractable  analytical  model.  Other  literature 
mostly  models  an  organization  in  an  information  theoretic  framework.  Reference  [3] 
develops  a  model  for  interacting  decision  makers.  A  decision  maker  is  modeled  by 
a  processor  that  takes  input  (stimulus),  processes  it,  and  produces  an  appropriate 
output(response).  Each  processor  has  several  algorithms  to  choose  for  this  informa¬ 
tion  processing.  Internal  decision  strategies  of  choosing  an  algorithm  are  introduced. 
Total  activity  of  a  processor  and  the  performance  measure  are  expressed  in  terms 
of  these  internal  decision  strategies,  and  bounded  rationality  is  modeled  by  the  con¬ 
straints  on  total  activity  of  a  processor.  Reference  [3]  characterizes  the  internal 
decision  strategies  that  give  optimal  or  satisficing  performance  under  constrained  or 
unconstrained  total  activity.  In  [4]  each  member  of  an  organization  is  again  mod¬ 
eled  as  an  information  processor.  Techniques  for  allocating  information  processing 
tasks  to  members  are  discussed.  Creating  self-contained  tasks  is  analogous  to  de¬ 
composing  a  global  cost  function  in  our  formulation.  Self-contained  tasks  translate 
to  decoupled  subcost  functions  in  our  formulation.  Reference  [5]  uses  a  queueing 
network  model  to  describe  a  team  of  two  decision  makers.  Two  decision  makers 
are  modeled  as  two  service  stations  with  different  processing  capabilities,  different 
processing  rates  (expertise),  and  common  information.  There  are  three  classes  of 
tasks,  and  tasks  arrive  at  the  team  of  decision  makers  dynamically.  Reference  [5] 
discusses  the  optimal  policy  to  select/allocate  these  tasks. 

Our  proposed,  mathematical  model  of  an  organization  is  simpler  than  the  in¬ 
formation  processor  models  of  [3],  [4],  [5]  in  the  sense  that  a  ‘Decentralized-descent 
algorithm’  does  not  describe  an  organization’s  interaction  with  the  external  envi¬ 
ronment.  On  the  other  hand,  our  model  is  more  explicit  in  showing  interactions 
between  divisions  or  the  coupling  of  organizational  task. 


M  M 


l»,  |t. 


The  mapping  problem  in  [8]  is  quite  similar  to  some  of  our  proposed  problems. 
Consider  a  program  made  up  of  several  modules.  When  these  modules  are  executed 
in  parallel,  some  modules  must  communicate  with  each  other.  When  parallel  pro¬ 
cessors  are  incompletely  connected,  a  pair  of  modules  that  must  communicate  with 
each  other  should  be  executed  by  neighboring  processors.  The  mapping  problem  is 
how  to  assign  modules  to  processors  so  that  the  number  of  such  successfully  placed 
pairs  of  modules  are  maximized.  A  set  of  modules  and  their  communication  require¬ 
ment  are  analogous  to  our  subtasks  and  their  couplings.  However,  there  are  a  few 
differences.  In  the  mapping  problem,  communication  structure  of  modules  and  the 
interconnection  of  processors  are  fixed.  In  our  problem,  couplings  among  subtasks 
depend  upon  how  we  decompose  a  global  task.  How  we  decompose  a  global  task  is 
a  more  important  issue  than  how  we  map  subtasks  to  agents. 


1.4  Outline 


In  Chapter  2,  the  mathematical  model  recruited  for  the  discussion  will  be  ex¬ 
plained  in  detail.  Mathematical  realization  of  suggested  problems  will  be  made  in 
various  ways.  For  each  realization,  a  strategy  for  solution  will  be  briefly  indicated. 
In  subsequent  chapters  (Chapter  3  -  Chapter  4  ),  each  formulation  will  be  discussed 
in  detail,  and  solutions  for  them  are  discussed  in  detail. 


Chapter  2  .  MATHEMATICAL  MODEL 


2.1  Decentralized  Gradient-like  Algorithm  for  an  Additive  Cost  Function 

In  this  section  we  introduce  the  decentralized  gradient-like  algorithm  for  an 
additive  cost  function  and  make  some  important  observations.  We  also  discuss  how 
this  algorithm  models  the  behavior  of  an  organization. 

For  the  cost  function,  J(x )  =  J(x i,x2,....,xM)  =  Ejli  ^*(*1**2. —i*m)  >  the 
processor  i  has  the  subcost  J‘  and  is  responsible  for  the  variable  x,.  The  algorithm 
is  summarized  as  follows: 

Algorithm  2.1.1 
At  each  iteration  n, 

1.  Each  processor  j  evaluates  the  partial  derivative  A^(n)  =  ^~(n)  for  every  1 
such  that  J 3  depends  on  x*. 

2.  Each  processor  j  transmits  A^  (n)  to  processor  i,  for  every  processor  i  such  that 
J1  depends  on  x,. 

3.  Each  processor  t  updates  Xj  according  to 

ii(n  +  1)  =  x,(n)  M(n)) 

(  Here,  7,  is  a  positive  scalar  stepsize.  ) 

4.  Each  processor  i  transmits  x,(n  +  1)  to  all  processors  j  that  depend  upon  xt. 

Algorithm  2.1.1  is  a  synchronous  version  of  a  decentralized  gradient-like  algo¬ 
rithm.  Since  processors  communicate  all  the  necessary  information  at  the  end  of 
each  iteration,  this  algorithm  is  mathematically  doing  the  following: 
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x(n  +  1)  =  x(n)  —  DVJ(x(n)) 


where  D  =  diag(7U~i2 . ,1m)- 

This  is  a  steepest  descent  algorithm  with  scaling,  and  7*  is  a  scaling  factor  of  the 
i-th  component  of  the  gradient. 

An  asynchronous  version  of  Algorithm  2.1.1  is  basically  the  same  as  Algorithm 
2.1.1  except  that  the  asynchronous  version  does  not  require  x’s  and  A’s  to  be  trans¬ 
mitted  at  each  iteration.  It  also  allows  communication  delay.  It  has  been  shown  in 
[l]  that  the  asynchronous  version  of  Algorithm  2.1.1  converges  if  the  A’s  and  x’s  are 
transmitted  frequently  enough.  A  precise  mathematical  statement  is  in  [  l] . 

An  important  observation  is  that  for  all  the  pairs  i  and  j  such  that  J3  depends 
upon  x,,  x,  must  be  transmitted  from  processor  i  to  j,  and  A^  must  be  transmitted 
from  processor  j  to  i.  It  is  not  necessary  that  all  the  processors  communicate  with 
each  other.  Therefore,  a  path  is  only  required  between  certain  pairs  of  processors. 
A  set  of  subcost  functions  requires  a  certain  class  of  communication  networks. 

We  view  minimizing  the  global  cost  function,  Jr(x1,....,xjv/)  =  J'  as  a  task 
of  an  organization.  Values  of  M  variables,  xi,X2,....,xm  represent  decisions  the  orga¬ 
nization  has  to  make  to  accomplish  the  task.  In  order  to  reduce  the  work  load  and 
to  enhance  the  security  of  organizational  information,  an  organization  distributes 
the  authorities  to  make  such  decisions  to  M  divisions.  Each  division,  i  has  an  au¬ 
thority  to  determine  the  value  of  ij.  Each  subcost  J ’  specifies  the  subtask  entrusted 
to  division  i.  J‘  is  known  only  to  division  i.  Therefore,  each  division  does  not 
know  what  others  are  doing;  this  is  to  serve  the  security  objective.  Each  state  of 


computation  shows  how  far  the  organization  has  advanced  in  performing  the  task. 
Iterative  computation  models  boundedly  rational  divisions  (or  human  decision  mak¬ 
ers),  which  make  decisions  based  upon  partial  knowledge  and  update  the  decisions 
as  they  acquire  more  information,  hopefully  in  a  direction  that  decreases  costs. 

2.2  Problem  formulation 

In  this  section  we  rigorously  formulate  the  proposed  problems  within  the  mathe¬ 
matical  framework  described  in  section  2.1.  In  order  to  make  our  discussion  concise, 
we  define  the  following  symbols  in  advance. 

:  the  global  cost  function 

DMi  :  decision  maker,  processor,  agent,  or  division  which  is  responsible  for  the 
value  of  X, 

J'  :  a  subcost  function 

(If  a  certain  term  of  the  global  cost  function  is  in  J’,  we  say  that  the  term 
is  ‘assigned  to  DMi'.) 
nt(i)  :  the  number  of  cross  terms  of  J’ 

(  We  assume  that  J" s  are  quadratic;  therefore,  we  can  count  the  number 
of  cross  terms.  We  count  a  product  of  two  variables  as  one  term.  For 
example,  for  J1  =  x\  +  x2(x3  +  x4),  nt(i)  =  2.  ) 

A^  :  a  partial  derivative 

d(DMt)  :  degree,  the  number  of  links  incident  upon  DM{ 

L(G )  :  a  set  of  nodes  of  a  graph  G  whose  degree  is  1.  (leaves) 
max,  nt(i )  :  maximum  of  nt(i)  over  all  i. 


Gq  =  ( Vq,Eq ):  undirected  graph  with  a  node  set  Vq  and  an  edge  set  Eq 
G  =  (V,E)  :  undirected  graph  with  a  node  set  V  and  an  edge  set  Eq 


Given  a  cost  function  J(xi,X2,....,Xm),  there  are  many  ways  to  decompose 
it  as  the  sum  of  M  subcost  functions,  J*’s  such  that  J[xi,X2,...,xM)  =  YiiLi  J'  • 
Basically,  our  problem  is  to  find  a  decomposition  that  meets  the  specified  objectives. 

2.2.1  Global  cost  function 

As  mentioned  ahead,  the  study  is  restricted  to  the  quadratic  cost  function, 
J(xi,...xm)  =  (xx,...,xm)Q(*i>  ... ,xm)T 

where  Q  is  a  symmetric  matrix.  In  order  to  represent  J,  we  will  often  use  an  undi¬ 
rected  graph  Gq  =  ( Vq,Eq ),  where  Vq  =  {l,2,...,Af}  and  Eq  =  {(*, j) |Q(i, j)  7^ 
0}.  (  Q(i,j)  is  an  entry  of  Q ,  t'-th  row  and  j- th  column.  )  Conceptually,  the 
matrix  Q  or  the  graph  Gq  characterizes  an  organizational  task.  For  example,  the 
dimension  of  Q  represents  the  scale  of  the  organizational  task.  Dense  Q  represents 
a  situation  where  the  organizational  task  is  very  intricate  in  nature.  In  other  words, 
a  decision  made  by  a  division  affects  many  other  divisions,  no  matter  how  the  task 
is  partitioned.  Therefore,  sophisticated  coordination  or  cooperation  [7]  strategy  is 
desired  for  this  kind  of  task.  On  the  other  hand,  a  diagonal  matrix,  Q  represents  a 
situation  where  each  division  can  perform  its  own  task  without  any  interaction  with 
other  divisions. 


In  our  formulation  we  assume  that  Gq  is  connected.  For  Gq  not  connected,  one 
can  always  permute  the  rows  and  columns  of  Q  and  transform  it  to  a  block-diagonal 
matrix,  Q'.  Therefore,  if  Q'  has  k  diagonal  blocks,  the  cost  J  can  be  decomposed  into 
k  subcosts  completely  decoupled,  and  the  problem  is  reduced  to  our  formulation  by 
considering  each  connected  piece  of  Gq  separately.  We  also  assume  that  the  diagonal 
elements  of  Q  are  non-zero. 

2.2.2  Assumptions  on  decomposition 

We  assumed  in  the  previous  section  that 

Q(i,  «)#0,  *  =  1.2,  ...,M 

This  means  that  the  global  cost  function  J  does  not  miss  any  square  term.  We  now 
mandate  that  each  square  term  Q(*,i)x?  must  be  in  J*  in  the  decomposition.  The 
interpretation  of  this  assumption  is  that  each  division  (DMi)  must  take  over  the 
partial  task  (z?)  that  does  not  involve  decisions  of  other  divisions.  This  assumption 
makes  mathematical  formulation  clear. 

If  J 3  depends  upon  x^,  there  must  be  a  path  from  DMi  to  DMj  through  which 
x,  will  be  communicated,  and  a  path  from  DMj  to  DMi  through  which  will  be 
communicated.  This  fact  was  mentioned  in  the  previous  section.  We  will  consider 
the  issue  of  decomposing  a  global  cost  function  under  two  different  assumptions 
concerning  the  communication  between  such  pairs  of  processors. 

The  first  assumption  is  that  if  J3  depends  upon  xt,  there  must  be  a  direct 
bidirectional  link  between  DM{  and  DMj.  We  call  this  assumption  ‘Direct  Com¬ 
munication’.  This  assumption  is  motivated  by  the  security  purpose.  If  information 
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of  x,-  or  A^  travels  from  a  source  to  destination  through  other  DM's,  other  DM' s 
have  chances  to  acquire  that  information.  This  is  not  desirable  for  the  security  of 
information,  so  we  want  a  direct  link  between  DM,  and  DMj.  This  assumption  is 
also  motivated  by  the  speed  consideration.  If  information  is  relayed  by  intermediate 
DM's  as  described  above,  delays  of  information  transfer  will  be  longer,  and  inter¬ 
mediate  DM's  will  be  unnecessarily  overloaded  by  carrying  information  in  transit. 

The  second  assumption  is  more  relaxed.  We  only  require  that  DM,  and  DMj 
be  connected  through  a  series  of  links  if  J*  depends  upon  xy.  Information  on  the 
value  of  variables  or  partial  derivatives  can  be  communicated  en  route  other  pro¬ 
cessors  between  this  kind  of  pair  of  processors.  We  call  this  assumption  ‘Indirect 
Communication’. 


From  these  assumptions,  the  following  lemma  immediately  follows. 

Lemma  2-1 

Let  the  global  cost  function  be  J  =  xT Qx.  If  Q(i,j)  #  0,  there  must  be  a  path 
between  DM,  and  DMj  in  G  no  matter  how  J  is  decomposed.  Moreover,  DM,  and 
DMj  are  within  two  hops  under  ‘Direct  Communication’  assumption. 

Proof 

Let  us  say  that  XjXy  is  in  Jk  in  a  decomposition.  Since  Jk  depends  upon  xx  and 
xy,  the  value  of  x,  and  the  partial  derivative  A*  must  flow  between  DM,  and  DMk- 
Therefore,  a  path  between  DMX  and  DMk  is  required.  Likewise,  a  path  between 
DMj  and  DMk  is  required.  Therefore,  there  must  be  a  path  between  DM,  and 
DMj  in  G. 
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Under  the  ‘Direct  Communication’  assumption  a  link  must  exist  between  DM, 
and  DM*,  and  between  DMy  and  DM* •  Therefore,  DM,  and  DM;  are  within  two 
hops  in  G.  (If  k  =  *  or  k  =  j,  the  result  still  holds,  trivially.) 

2.2.3  Measure  of  objectives 

In  this  section  the  mathematical  definitions  of  three  objectives  of  decomposition 
are  discussed;  namely, 

Objective  1:  balance  of  loads 

Objective  2:  the  amount  of  communication 

Objective  3:  the  speed  of  convergence 

Objective  1  reflected  the  idea  that  one  of  the  purposes  of  a  decomposition  's 
to  balance  loads  that  fall  on  each  division  of  an  organization.  In  our  model  (D'- 
centralized  gradient-like  algorithm  for  additive  cost  functions  ),  the  load  on  each 
division  DM,  arises  from  the  task  of  minimizing  a  subcost  function  J*.  Therefore, 
a  reasonable  measure  of  the  load  on  DMj  will  be  the  amount  of  effort  or  resources 
DMi  has  to  exert  to  minimize  Jx.  This  effort  or  resources  include  memory  spaces 
and  computational  operations.  In  Algorithm  2.1.1,  the  locai  memory  of  processor 
DMi  must  contain  the  the  subcost  function  Jx,  the  set  of  variables, 

SV(t')  =  {xXj\Jx  depends  upon  xy), 

the  set  of  partial  derivatives 

SP(i)  =  {A*|  Jk  depends  upon  x,}. 

Notice  that 


|SV(t)|  =  the  number  of  variables  Jx  has 


|SP(t)|  =  the  number  of  subcost  functions  that  depend  upon  X{ 

As  for  the  computational  operations,  DM{  must  receive  the  values  of  partial  deriva¬ 
tives  from  |SjP(t)|  processors  in  order  to  update  x^.  Also,  DM,  must  add  |SP(i)| 
partial  derivatives  in  order  to  update  x,.  (  Recall  )  DM,  must  send 

the  updated  x,  to  |5P(j)|  processors  whose  subcost  function  depends  upon  x,.  DMt 
also  has  to  compute  |SP(t)|  partial  derivatives  and  send  to  |SP(i)|  corresponding 
processors. 

Therefore,  the  accurate  measure  of  the  load  on  processor  DMj  will  be  a  function 
of  not  only  J*  but  also  the  whole  decomposition  of  J .  The  sum  of  the  following  will 
be  an  accurate  measure  of  load  on  DM,-:  the  cost  of  memory  space  for  storing 
J’,  the  cost  of  storing  variables  proportional  to  |5V(*)|,  the  cost  of  storing  partial 
derivatives  proportional  to  |5P(»)|,  the  cost  of  receiving  values  of  partial  derivatives 
proportional  to  |5P(»)|,  the  cost  of  adding  |SP(t)|  partial  derivatives,  the  cost  of 
transmitting  its  variable  proportional  to  |SP(t)|,  the  cost  of  computing  |SP(t)| 
partial  derivatives,  the  cost  of  transmitting  these  partial  derivatives  proportional  to 
|SP(i)|. 

In  order  to  simplify  our  analysis,  we  will  use  an  approximate  measure  of  loads 
on  each  processor.  We  simply  define  the  load  of  processor  DM*  to  be  the  number 
of  cross  terms  J ’  contains  .  This  definition  is  possible  because  we  are  using  the 
quadratic  function  as  a  global  cost  function  in  our  model.  We  can  count  the  number 
of  cross  terms  in  a  quadratic  cost  function.  The  advantage  of  this  definition  is  that 
we  can  view  the  decomposition  process  as  assigning  each  cross  term  to  processors. 
Every  time  a  cross  term  in  a  global  cost  function  is  assigned  to  a  processor,  the  load 
on  this  processor  is  increased  by  one.  The  analysis  is  simplified  a  lot  in  this  way. 


This  definition  of  load  is  a  relatively  faithful  approximation  to  the  accurate  measure 
of  load  stated  above.  If  the  number  of  terms  grow  in  a  subcost  function,  the  required 
memory  space  to  store  this  subcost  function  must  also  become  big.  Generally,  if  the 
number  of  cross  terms  in  J*  is  big,  J*  depends  upon  many  variables  (  big  ISV’f*)!  ). 
We  use  nt(i)  to  denote  the  number  of  terms  assigned  on  DM,-. 

We  define  the  measure  of  balance  to  be  the  maximum  of  these  numbers  of  terms 
over  all  processors,  i.e.  max,  nt(t').  For  the  same  global  cost  function  J,  smaller 
max,  nt(i)  means  more  balanced  decomposition. 


The  amount  of  communication,  again,  will  be  extremely  complicated  if  we  want 
to  follow  our  algorithmic  model  literally.  We  would  have  to  count  every  single 
message  transmitted  and  received.  Theoretical  analysis  of  this  is  impossible.  Even 
if  we  were  able  to  count  all  the  message  flows,  the  result  would  mean  nothing  more 
than  the  cost  of  communication  in  parallel  processing.  We  present  two  approximated 
measure  of  the  amount  of  communication,  and  explain  how  these  measures  can  be 
interpreted  more  meaningfully. 

First,  we  consider  the  number  of  processor  pairs  that  need  to  communicate  with 
each  other.  We  have  stated  that  DM,  and  DMj  must  communicate  with  each  other 
if  J*  depends  upon  x}  or  depends  upon  x,.  We  count  the  number  of  such  pairs 
and  use  this  number  as  the  amount  of  communication.  As  long  as  a  pair  of  processors 
need  to  communicate  with  each  other,  we  do  not  care  how  many  bits  of  information 
need  to  be  actually  exchanged.  This  is  a  convenient  simplification.  Under  the  first 
assumption  on  decomposition  in  section  2.2.2  (‘Direct  Communication’),  this  number 
of  processor  pairs  is  the  total  number  of  necessary  links  in  a  set  of  processors  in  order 
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to  perform  the  organizational  task  with  the  corresponding  partition  of  tasks.  We 
call  this  measure  of  the  amount  of  communication  TL  (Total  number  of  Links). 
This  measure,  TL  can  also  be  interpreted  as  the  measure  of  risk  of  the  leakage  of 
organizational  secret  if  every  direct  channel  has  the  same  probability  of  information 
leakage.  If  an  organization  has  a  fixed  structure  in  terms  of  its  communication  ability 
(  this  will  be  explained  in  Section  2.2.4  ),  TL  is  a  total  number  of  edges  in  the  graph 
representing  the  fixed  organizational  structure. 

As  an  alternative,  we  define  the  amount  of  communication  slightly  differently 
from  TL.  Decomposing  a  quadratic  global  cost  function  J(xi,x2, ....,xm)  can  be 
viewed  as  assigning  each  cross  term  of  J  to  one  of  processors,  DMX,DM2,  ...,  DMm- 
If  the  cross  term  x,xy  is  allocated  to  DMk ,  k  ^  i,k  ^  j,  two  links  must  exist;  namely, 
a  direct  link  from  DM;  to  DMk  and  a  direct  link  from  DMj  to  DMk •  If  the  cross 
term  x^xy  is  allocated  to  DMi  or  DMj ,  one  link  must  exist;  namely  a  link  between 
DMi  and  DMj.  From  this  fact  we  can  define  the  measure  of  necessary  communica¬ 
tion  as  the  sum  of  the  number  of  links  introduced  by  this  rule  over  all  cross  terms.  We 
call  this  number  lRL ’  (Repeated  number  of  Links).  Obviously,  this  is  not  an  exact 
number  of  necessary  links  under  the  assumptions  of  previous  chapters.  The  reason 
is  that  the  number  of  links  can  be  counted  repeatedly  for  the  same  pair  of  processors 
that  need  to  communicate  with  each  other.  (For  this  reason  RL  will  be  often  called 
‘superposed  link’.)  However,  this  definition  of  communication  load  serves  as  an 
approximate  measure.  There  is  another  interpretation  for  this  measure  RL.  The 
work  load  of  each  division  nt(i)  is  defined  to  be  the  number  of  terms  assigned  to  that 
division  DM<.  Let  us  imagine  that  each  division  consists  of  individuals,  and  each 
individual  is  responsible  for  one  term.  In  the  same  division,  the  decision  variable  of 
its  own  is  known  to  every  individual.  However,  for  security  purpose,  individuals  are 
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not  allowed  to  inform  one  another  with  the  values  of  decision  variables  they  received 
from  other  division.  In  this  set-up,  division  DM{  must  communicate  with  all  other 
individuals  in  other  divisions  who  are  assigned  with  a  cross  term  involved  with  xt. 
Therefore,  RL  is  a  total  number  of  communication  channels  in  this  set-up. 


As  for  the  speed  of  convergence,  the  best  measure  for  mathematical  analysis 
would  be  the  rate  of  convergence  of  the  algorithm.  However,  the  reasonable,  coherent 
definition  of  the  rate  of  convergence  for  distributed  and  asynchronous  algorithm 
is  very  messy  in  terms  of  its  mathematical  expression.  Also,  it  is  very  difficult 
to  compute.  This  theoretical  approach  becomes  mathematically  intractable.  For 
example,  the  speed  of  convergence  is  related  to  many  factors  other  than  how  we 
decompose  the  global  cost  function.  It  depends  upon  the  stepsize  and  the  frequency 
of  communication  between  processors  as  well.  Finding  analytical  relations  between 
these  factors  and  the  speed  of  convergence  is  mathematically  intractable.  The  only 
way  to  gain  some  understanding  is  to  perform  insightful  computer  simulations.  One 
would  want  to  show  through  simulation  that  balanced  assignments  of  subcosts  in 
general  improve  the  speed  of  convergence.  Therefore,  the  whole  analysis  of  the 
speed  of  convergence  should  inevitably  be  done  heuristically.  We  suggest  that  one 
should  simulate  Algorithm  2.1.1  and  count  the  number  of  iterations  of  the  simulation 
program  until  it  approaches  the  solution  sufficiently  close.  We  suggest  using  this 
quantity  as  a  measure  of  the  speed  of  convergence.  We  suggest  one  should  design  the 
simulation  program  in  a  way  that  this  iteration  count  represents  the  real  time  taken 
to  run  Algorithm  2.1.1.  Now,  the  issue  is  how  to  decompose  a  global  cost  function 
in  order  to  meet  this  objective  of  fast  convergence.  It  is  intuitively  clear  that  the 
balanced  decomposition  ends  up  with  faster  speed  of  convergence.  Therefore,  we 
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conclude  that  Objective  2  (fast  convergence)  of  Section  1.2  is  embedded  in  Objective 
1  (balance  of  loads)  in  general. 

2.2.4  Three  types  of  organizations 

We  have  rigorously  stated  the  objectives  of  decomposition  of  a  global  cost  func¬ 
tion  (or  partitioning  an  organizational  task)  and  our  assumptions.  In  this  section 
we  classify  organizations  into  three  classes;  namely,  fixed  organization,  flexible  orga¬ 
nization,  semi-flexible  organization.  This  classification  is  based  upon  the  flexibility 
of  their  communication  structure.  (‘Communication  structure’,  here,  simply  means 
which  pairs  of  divisions  or  processors  can  communicate  directly.  It  can  be  repre¬ 
sented  by  a  graph,  where  each  node  symbolizes  a  division  or  a  processor,  and  each 
link  symbolizes  a  pair  of  processors  that  can  communicate  directly  with  each  other.) 
Our  problem  is  to  decompose  the  global  cost  function  in  order  to  meet  the  objectives 
(section  2.2.3)  under  the  assumptions  in  Section  2.2.  This  problem  shows  distinctive 
lineaments  when  solicited  for  these  different  types  of  organizations. 

In  a  fixed  organization,  the  communication  structure  is  strictly  fixed.  Each 
division  is  a  priori  determined  to  be  responsible  for  a  certain  decision  variables. 
(Decision  variables  of  an  organization  mathematically  translates  to  variables  of  J  in 
section  2.2.1.  )  Which  pairs  of  these  divisions  can  directly  communicate  with  each 
other  is  again  determined  a  priori.  Since  the  task  allocator  or  a  task  allocating  algo¬ 
rithm  does  not  have  an  authority  to  specify  or  modify  the  communication  structure 
of  an  organization,  it  has  to  find  a  decomposition  that  only  requires  communication 
through  predetermined  links.  (Recall  if  J%  depends  upon  xy,  a  communication  be¬ 
tween  a  processor  responsible  for  x,  and  one  for  xy  must  communicate  with  each 


In  a  flexible  organization,  the  communication  structure  is  not  predetermined. 
The  task  allocator  or  task  allocating  algorithm  has  an  authority  to  impose  a  com¬ 
munication  structure  of  an  organization. 

In  a  semi-flexible  organization,  the  communication  structure  of  an  organiza¬ 
tion  is  fixed.  However,  which  processor  is  responsible  for  which  decision  variable 
is  not  determined  a  priori.  The  task  allocator  or  the  task  allocation  algorithm  has 
an  authority  to  determine  the  mapping  from  a  set  of  decision  variable  to  a  set  of 
processors  as  well  as  the  decomposition  of  a  global  cost  function. 

2.2.5  Specific  formulations 

Our  problem  is  to  find  a  decomposition  that  meets  the  objectives  specified  in 
section  2.2.3  under  the  assumptions  specified  in  section  2.2.1  and  2.2.2.  We  are 
concerned  with  two  objectives;  namely,  the  balance  of  loads  and  the  reduction  of 
the  amount  of  communication.  We  have  concluded  that  the  speed  of  convergence 
(Objective  3)  is  embedded  in  the  objective  of  the  balanced  load.  We  suggested  two 
definitions  for  the  amount  of  communication  in  section  2.2.3.  In  section  2.2.2,  we 
suggested  two  different  assumptions  on  decomposition;  namely,  ‘Direct  Communica¬ 
tion’  and  ‘Indirect  Communication’.  Since  we  have  two  definitions  of  the  amount  of 
communication  to  select  and  two  assumptions  to  choose  from,  we  can  think  of  four 
combinations  for  the  formulation  of  problems.  We  can  match  each  of  these  four  prob¬ 
lems  to  three  different  types  of  organizations  specified  in  section  2.2.4.  Therefore, 
we  have  twelve  cases  to  consider. 


2.2.5-1  ‘Direct  Communication’ 


<  TL  as  the  amount  of  communication  > 

A.  Fixed  organization 

Since  the  organization  has  a  fixed  communication  structure,  the  total  number 
of  links  is  also  fixed.  TL  is  invariant  of  a  decomposition  of  J.  Therefore,  we  can  only 
consider  the  balance  of  load.  Our  problem  is,  then,  to  find  a  decomposition  that 
minimizes  max,-  nt(t')  such  that  ‘Direct  Communication’  assumption  is  satisfied. 

Load  Balancing  in  a  Fixed  Structure 
Given  Gq  =  ( Vq,Eq )  and  a  graph  G  =  (V,  E),  find  a  decomposition  that  minimizes 
maxint(i). 


B.  Flexible  organization 

Let  us  consider  TL  as  our  objective.  If  we  do  not  care  about  the  other  objective 
(the  balance  of  loads),  the  problem  of  minimizing  TL  becomes  trivial.  If  we  assign 
the  global  cost  function  to  a  single  processor,  TL  is  M  —  1,  where  M  is  the  number 
of  variable.  (Af  =  |Vq|)  TL  is  M  —  1  because  J  has  all  the  variables  xj,x2,  ...,xa/- 
M  —  1  is  a  minimum  TL  because  G  has  M  nodes,  and  connected  by  Lemma  2-1. 
(Recall  that  we  assumed  Gq  is  connected.) 

If  we  care  about  the  balance  of  load  while  minimizing  T L,  the  problem  becomes 
nontrivial. 


Minimal  Link 


Given  J  =  x1  Qx,  nt ,*,  find  a  decomposition  that  minimizes  TL  such  that  nt(t)  < 
nt 1,  i  =  1,2, ...,  Af. 

This  problem  models  a  situation  where  each  division  has  a  limited  capacity 
(nfj),  and  we  want  to  partition  an  organizational  task  (minimizing  J)  such  that  the 
size  of  divisional  task  (nt(t))  is  within  the  capacity  of  corresponding  division.  The 
aim  of  decomposition  is  to  minimize  necessary  amount  of  communication  {TL). 

Now,  let  us  consider  max*  nt(i)  as  our  objective.  Again,  if  we  do  not  care  about 
the  amount  of  communication,  the  problem  becomes  trivial.  We  equally  split  the 
cross  terms  of  J  and  assign  pieces  to  processors.  It  will  produce  max,  nt[i)  =  [ 

If  we  restrict  the  amount  of  communication,  the  problem  becomes  nontrivial. 

Load  Balancing  with  Limited  Link 

Given  J  =  xTQx,  TL*,  find  a  decomposition  that  minimizes  max ,  nt{i)  such  that 
TL  <  TL*. 

i 


C.  Semi-flexible  organization 

Like  Fixed  organization,  TL  is  fixed  in  this  case.  The  problem  is  to  find  an 
assignment  of  decision  variables  and  a  subcost  functions  to  processors  that  minimize 
max i  nt{i). 

Mapping  for  Load  Balancing 

Given  Gq  and  G  =  [V,E),  find  a  decomposition  J{x)  —  an^  a  matching 

between  {xi,x2, ...,xm}  and  V,  so  that  these  minimize  max,  nt{ *). 


Even  before  the  issue  of  finding  a  decomposition  J{x)  —  ^2  Jx  is  discussed, 


finding  a  feasible  matching  between  {xi,£2,...,xm}  V'  is  a  notrivial  problem.  It 
will  be  discussed  in  detail  in  Chapter  3. 


<  RL  as  the  amount  of  communication  > 

A.  Fixed  organization 

Let  us  consider  RL  as  an  objective.  The  problem  is,  then, 

Minimal  Superposed  Link  in  a  Fixed  Structure 
Given  a  graph  G,  Gq,  and  nt\,  find  a  decomposition  that  minimizes  RL  such  that 
the  message  is  only  transferred  through  the  edges  of  G,  and  such  that  nt(i)  <  nft*, 
*  =  1,2, ...,  M. 

(  We  can  cast  off  the  constraint  on  the  capacity  of  each  processor  by  letting  nt(i)  — 

oo, V*.  ) 

We  can  also  consider  max,nf(i)  as  our  objective. 

Load  Balancing  with  Limited  Superposed  Links  in  a  Fixed  Structure 
Given  a  graph  G,  Gq,  and  RL* ,  find  a  decomposition  that  minimizes  max{nt{i ) 
such  that  the  message  is  only  transferred  through  the  edges  of  G,  and  such  that 
RL  <  RL*. 

B.  Flexible  organization 

As  in  the  case  of  (  TL,  Flexible  organization  ),  the  problem  of  minimizing  RL 


becomes  trivial  if  we  do  not  care  about  the  balance  of  loads.  We  know  that  RL  is 
increased  either  by  one  or  two  whenever  we  assign  a  cross  term.  If  we  do  not  care 
about  the  balance  of  loads,  for  each  cross  term  x,xy,  we  assign  it  to  either  DM,  or 
DMj.  This  way,  RL  is  increased  only  by  one  for  each  cross  term.  Therefore,  RL  is 
|£g|,  and  this  is  minimum.  As  in  the  case  of  (  TL,  Fixed  organization),  we  can  put 
constraints  on  the  capacity  of  each  processor. 

Minimal  Superposed  Link 

Given  J  —  xTQx,  ntf,  find  a  decomposition  that  minimizes  RL  such  that  nt(i)  < 
nt*,  i  =  1,2,  ...,M. 

If  we  use  maxint[i )  as  our  objective,  the  problem  is 

Load  Balancing  with  Limited  Superposed  Link 
Given  J  =  xTQx,  RL* ,  find  a  decomposition  that  minimizes  max,  nt(i)  such  that 
RL  <  RL*. 

C.  Semi-flexible  organization 

Mapping  for  Minimal  Superposed  Link 
Given  Gq,  G  =  (V,E),  and  nt *  for  i  =  1,2,..,M,  find  a  one-to-one  mapping, 

°  '■  {*1,X2, . ,XM}  - *  V 

and  a  decomposition  that  minimize  RL  such  that  nt(i)  <  nt*,  i  =  1,2 ,...,M. 

(  We  can  cast  off  the  constraint  on  the  capacity  of  each  processor  by  letting  nt(i )  = 


Mapping  for  Load  Balancing  with  Limited  Superposed  Link 
Given  Gq,  G  =  (V,  E),  and  RL* ,  find  a  one-to-one  mapping, 

o  :  {21,  *2, . ,xM}  — ►  V 

and  a  decomposition  that  minimize  max*  nt(i)  such  that  RL  <  RL* . 


2. 2.5-2  ‘Indirect  Communication’ 

<  TL  as  the  amount  of  communication  > 

A.  Fixed  organization 

Since  G  is  fixed,  TL  is  also  fixed.  The  only  objective  we  can  consider  is 
maz,ni(i).  The  problem  is,  then, 

Given  Gq  =  [Vq,Eq)  and  a  graph  G  =  [V,E),  find  a  decomposition  that 
minimizes  max*  nt(t). 

We  have  assumed  that  Gq  is  connected.  Therefore  if  |Vq|  =  |V|,  and  G  is  not 
connected,  feasible  decomposition  does  not  exist.  On  the  other  hand,  as  long  as  G 
is  connected,  the  problem  becomes  trivial.  Since  we  allow  indirect  communication, 
any  pair  of  processors  can  communicate  their  messages  with  each  other  as  long  as 
G  is  connected.  Therefore,  we  can  equally  split  cross  terms  of  J  and  assign  pieces 
to  processors.  Then, 


max ,  nt(»)  =  [ 


B.  Flexible  organization 


In  this  case  we  can  trivially  find  a  decomposition  that  minimizes  both  TL  and 
maxi  nt(i)  at  the  same  time.  We  have  assumed  that  Gq  is  connected,  so  G  ends  up 
being  connected  for  any  decomposition  of  J.  Therefore,  minimum  TL  is  M  —  1.  (G 
is  a  tree.)  Moreover,  since  we  allow  indirect  communication,  any  pair  of  processors 
can  communicate  with  each  other  in  a  tree-structured  organization.  Thus  we  can 
achieve 

maxi  nt(i)  = 


C.  Semi-flexible  organization 

T L  is  fixed.  Allowing  indirect  communication,  again,  trivializes  the  problem  of 
minimizing  mai,  nt{i).  Again,  if  G  is  not  connected,  the  problem  is  infeasible.  As 
long  as  G  is  connected,  for  any  one-to-one  mapping, 

G  •  . >  }  *  F" 

we  can  achieve 

...  r  I Eq I , 
maxi  nt{i)  =  \  | 

by  splitting  cross  terms  of  J  equally. 


<  RL  as  the  amount  of  communication  > 


A.  Fixed  organization 


Under  ‘Direct  Communication’  assumption,  whenever  a  cross  term  X{X j  was 
assigned,  RL  was  increased  either  by  one  or  two.  Also,  x,xy  can  be  assigned  only 
to  processors  that  has  direct  access  to  both  DM,  and  DMj.  Under  ‘Indirect  Com¬ 
munication’  assumption,  x,xy  can  be  assigned  to  any  processor  that  has  a  path  in 
G  to  both  DM{  and  DMj.  Whenever  x*xy  is  assigned  to  DMk ,  RL  is  increased 
by  nd(DMi,  DMk)  +  nd(DMj,DMk),  where  nd(DMi,  DMm)  is  the  distance  of  the 
shortest  path  between  DMi  and  DMj  assuming  every  edge  has  a  distance  1.  (nt 
stands  for  ‘nominal  distance’.)  Therefore,  when  we  choose  RL  as  an  objective,  the 
problem  is  formulated  as  the  following: 

Minimal  Message  Ambulation 
Given  a  graph  G,  Gq  and  n£t* ,  find  a  decomposition  that  minimizes 

^2  J2  nd{DMi,DMk) +  nd{DMj,DMk) 

k  xnj  in  Jk 

such  that 

nt{i)  <  nt*  i  =  1,2,  ..,M 

(  We  can  cast  off  the  constraint  on  the  capacity  of  each  processor  by  letting  nt(i)  = 

oo, V*.  ) 

We  can  also  consider  max,  nt(i)  as  our  objective. 

Load  Balancing  with  Limited  Message  Ambulation 
Given  a  graph  G,  Gq,  RL* ,  find  a  decomposition  that  minimizes  max,  nt(i)  such 
that 


E  E  nd{DMi,DMk)  +  nd{DMj,DMk)  <  RL* 


B.  Flexible  organization 


When  we  choose  RL  as  an  objective  the  problem  is: 

Minimal  Superposed  Link 

Given  J  =  x TQx,  ntf,  find  a  decomposition  that  minimizes  RL  such  that  nt(i)  < 
nt J,  i  =  1,2, 

When  we  choose  max,  nt(i)  as  an  objective,  the  problem  is: 

Load  Balancing  with  Limited  Superposed  Link 
Given  J  =  xTQx,  RL* ,  find  a  decomposition  that  minimizes  max,  nt{i)  such  that 
RL  <  RL*. 

In  both  problems,  the  solution  turns  out  to  be  identical  to  the  case  of  ‘Direct 
Communication’.  Even  though  we  allow  indirect  communication,  the  best  route  of 
the  message  flow  for  each  cross  term  is  a  direct  link  in  order  to  make  RL  small. 


C.  Semi-flexible  organization 

As  in  the  case  of  ‘Fixed  organization’,  RL  is 

£  £  nd{cr(xi),a{xk))  +  nd{o{x]),o{xk)), 

k  x.ij  in  Jk 

once  the  one-to-one  mapping 


o  :  {xi, X2 , . , } 


is  determined. 


Mapping  for  Minimal  Message  Ambulation 


Given  Gq,  G  =  (V,E),  and  nt*{  for  i  =  1,2,..,  M,  find  a  one-to-one  mapping, 

o  :  {xi,x2, ,xM}  — >  V 

and  a  decomposition  that  minimize 

E  E  nd(o{xi),o(x  jfc))  +  nd{o{xj),a{xk )) 

k  XiXj  in  Jk 

such  that  nt(i)  <  nt J,  i  =  1,2,  ...,M. 

Mapping  for  Load  Balancing  with  Limited  Message  Ambulation 
Given  Gq,  G  =  (V,  E),  and  RL* ,  find  a  one-to-one  mapping, 

o  :  {xi,Z2, . ,im}  — *  V 

and  a  decomposition  that  minimize  max*  nt(i)  under  the  constraint, 

E  E  nd(<r(x,), er(xfc))  +  nd(a[xj), o{xk))  <  RL * 

k  XiXj  *n  Jk 
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2.3  Summary 


The  problem  is  to  find  a  decomposition  J  =  ^ .  J*,  given  a  quadratic  function 
J  =  xtQx  so  that  the  objectives: 

1.  Balance  of  decomposition 

2.  Reducing  the  amount  of  communication 
are  satisfied. 

We  can  view  this  problem  as  a  combinatorial  optimization  problem.  A  decom¬ 
position  J  =  X),-  Jx  can  be  viewed  as  a  mapping  from  the  set  of  cross  terms  in  J  to 
the  set  of  processors.  *  (Figure  2-1)  The  total  number  of  such  mappings  are  finite 
(less  than  or  equal  to  M^Eq  I  where  M  is  the  number  of  processors,  and  \Eq\  is  the 
number  of  cross  terms). 

The  summary  of  these  twelve  cases  are  in  the  following  charts.  In  the  following 
chapters,  we  will  discuss  nontrivial  problems  formulated  in  this  chapter. 


*  One  may  imagine  splitting  a  cross  term  and  assign  pieces  to  different  processors. 
For  example, 

J  =  x\  -(-  x\  +  x\  +  3xix2  +  3x2a;3  4-  3x3xj 
J1  =  x\  +  XiX2 
J2  =  x\  4-  2xii2  +•  3x2x3 
J3  =  x2  +  3z3Xi 

However,  by  this  type  of  splitting,  we  cannot  improve  any  of  nt(i),  TL,  RL.  There¬ 
fore,  this  type  of  decomposition  is  not  considered,  and  a  decomposition  can  be  viewed 
as  a  mapping  explained  above. 
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Chapter  3.  DIRECT  COMMUNICATION 


In  this  chapter,  the  solution  of  nontrivial  problems  formulated  in  Chapter  2  is 
discussed. 

3.1  TL  as  an  amount  of  communication 


3.1.1  Fixed  organization 


When  an  organizational  structure  is  fixed,  and  a  global  task  is  given,  a  natural 
question  is  whether  this  organization  is  able  to  handle  the  task.  This  question  must 
be  answered  before  the  issue  of  balancing  loads  is  discussed.  An  organization’s  ability 
is  determined  by  two  criteria.  First,  each  subdivision  of  an  organization  must  be 
able  to  handle  the  subtask  allocated  to  it.  Second,  the  organizational  structure  must 
support  necessary  communication  among  subdivisions  caused  by  coupling  among 
subtasks.  From  this  point  of  view,  the  question  is  whether  there  exists  a  proper 
decomposition  of  the  given  global  task  for  this  organization.  A  proper  decomposition 
requires  that  the  size  of  each  subtask  be  within  the  capacity  of  the  subdivision 
to  which  it  is  assigned.  A  proper  decomposition  also  requires  that  if  a  decision 
of  one  subdivision  affects  the  task  of  another  subdivision,  two  subdivisions  should 
have  a  communication  link.  The  following  formulation  describes  this  question  of 
organization’s  ability  to  handle  the  task: 


Let  Gq  =  ( Vq,Eq )  be  the  graph  describing  the  structure  of  J,  and  let  G  — 
{V,E)  be  the  graph  describing  the  fixed  organizational  structure.  Given  Gq  = 
[Vq,Eq),  G  =  [V,E)  (both  graphs  are  assumed  to  be  connected  graphs)  and  nt*, 
i  =  1,2,3 does  there  exist  a  decomposition 
such  that  nt(i)  <  nt\,  i  =  l,2,3,...,Af,  ajid 
such  that  ‘Direct  Communication’  assumption  is  satisfied. 

The  algorithm  for  this  problem  consists  of  two  phases.  Phase  1  reduces  this 
problem  to  well  known  Max-flow  network  problem.  Phase  1  constructs  a  corre¬ 
sponding  digraph  for  Max-flow  problem.  Phase  2  can  be  any  algorithm  that  effi¬ 
ciently  solves  Max-flow  problem.  The  following  lemma  will  be  often  used  to  prove 
the  algorithm. 

Lemma  3-1 

If  a  cross  term  x,x;  is  to  be  assigned  to  DMi  or  DMj,  G  must  have  an  edge  (i,j). 
If  a  cross  term  x,xy  is  to  be  assigned  to  DMk,  k  ±  i,j ,  G  must  have  edges  (t\A)  and 

Proof 

The  proof  follows  immediately  from  ‘Direct  Communication’  assumption. 

As  mentioned  in  the  end  of  Chapter  2,  a  decomposition  J  =  J'  can  be  viewed 
as  a  mapping  from  the  set  of  cross  terms  in  J  to  the  set  of  processors.  Therefore, 
with  the  help  of  this  lemma,  the  problem  above  cam  be  equivalently  stated  as  the 
following: 


Given  Gq  =  ( Vq,Eq )  and  G  =  (V, E),  does  there  exist  a  mapping 

?  :  Eg  — »  V 

such  that 

if  =  DMk,  k  ^  i,k  ^  j,  then,  (DMi,DMk),(DMj,DMk)  G  E 

if  =  DMi  or  f(t ,j)  =  DMj,  then  (DM,,  DMj)  G  E 

and  such  that  |f_1(DM,)|  <  nt*  ,  Vi. 

The  following  algorithm  solves  this  problem. 

Let  M  =  | V|,  the  number  of  processors. 

Algorithm  3.1 
Phase  1 

1.  Create  |J5q|  nodes  corresponding  to  cross  terms  of  J.  Let  m,7  denote  the  node 
corresponding  to  x,xy. 

2.  Create  M  nodes  corresponding  to  processors.  Let  n*  denote  a  node  correspond¬ 
ing  to  DMi. 

3.  For  each  cross  term  x^xy,  do 

If  (DMi,  DMj)  £  E,  make  an  edge  from  mij  to  ni  and  make  an  edge  from 
m,7  to  nj. 

For  each  neighbor  of  DMi  in  G,  say  DMk  ^  DMj, 

If  ( DMk,DMj )  G  E,  make  an  edge  from  to  nk. 

If  no  edge  is  made  for  x,iy,  terminate  with  an  output; 

“Organization  cannot  handle  this  task.” 

Let  all  the  edges  created  in  Step  3  have  infinite  capacity. 


4.  Create  the  source  node  a  and  make  an  edge  from  s  to  each  m,y.  Each  edge, 
(s,m,y)  created  at  Step  4  has  capacity  1. 

5.  Create  the  sink  node  t  and  make  an  edge  from  each  n*  to  t.  Each  edge  ( ) 
created  at  Step  5  has  capacity  nt 

The  digraph  constructed  by  Phase  1  is  in  Figure  3-1.  Let  the  set  of  edges 
produced  in  Step  3  be  E$.  We  can  see,  then,  (m,y,  ft*)  G  E$  if  and  only  if  x,£y  can 
be  assigned  to  DM ^  without  violating  Lemma  3-1.  If  no  edge  is  constructed  for  some 
cross  term,  this  cross  term  cannot  be  assigned  to  any  processor,  so  the  organization 
cannot  handle  this  task.  (Line  of  Step  3) 

Phase  2 

1.  Run  Ford-Fulkerson  labeling  algorithm  for  the  constructed  digraph. 

(Set  an  initial  feasible  flow  through  each  link  that  is  integer.  Each  flow  can  be 
set  zero  initially.) 

2.  If  maximum  flow  is  |2?q|,  the  organization  can  handle  this  task; 

3.  If  maximum  flow  is  less  than  |-Eq|,  the  organization  cannot  handle  this  task. 

Let  us  prove  this  algorithm.  Ford-Fulkerson  algorithm  [6]  updates  flows  through 
an  ‘augmented  path’.  An  augmentation  path  is  a  special  path  from  the  source  to 
the  sink  in  the  undirected  graph  resulting  from  the  original  graph  by  ignoring  arc 
directions.  The  augmented  path  satisfies  the  following  two  conditions: 

1.  If  the  direction  of  the  original  ark  is  the  same  as  the  direction  of  the  augmen¬ 
tation  path,  the  flow  through  the  original  arc  is  strictly  less  than  the  capacity 


2.  If  the  direction  of  the  original  ark  is  opposite  to  the  direction  of  the  augmenta¬ 
tion  path,  the  flow  through  the  original  arc  is  strictly  greater  than  zero. 


Let  Pa  be  an  augmentation  path  at  certain  iteration  of  Ford-Fulkerson  algorithm. 
Let  e  be  an  edge  of  this  augmentation  path  Pa.  Let  the  push  flow  through  e  be 

(  capacity  of  ark  —  actual  flow  if  e  has  the  same  direction 

as  the  original  ark 


push(e)  = 


actual  flow  if  e  has  the  opposite  direction 

l  to  the  original  ark 

Ford-Fulkerson  algorithm  finds  an  augmentation  path  at  each  iteration  and  pushes 
flow  in  the  direction  of  the  augmentation  path  by  the  amount 


6  =  mineepa  push(e) 

Since  all  the  arc  capacities  of  the  graph  in  Fig  3-1  are  integers,  if  our  initial  flows 
are  integers,  6  is  an  integer.  Inductively,  all  the  flows  are  kept  to  be  integers  during 
the  run  of  the  algorithm. 

If  maximum  flow  is  |fjq|,  flows  through  all  the  arcs  incident  upon  the  source  s 
must  be  1  because  we  have  |jEq|  such  arcs  with  capacity  1.  Each  node  has  one 
edge  (rriij  ,  njt)  with  flow  1  because  of  the  flow  conservation,  and  because  we  have 
shown  that  the  flow  through  any  arc  is  kept  integer.  This  means  there  is  a  feasible 
assignment  of  the  cross  term  is  to  DMk- 

If  maximum  flow  is  less  than  |f?q|,  we  claim  that  no  assignment  of  cross  terms 
can  satisfy  Lemma  3-1  and  the  capacity  constraint  of  processors.  We  prove  this  by 
contradiction.  Suppose  there  is  a  satisfactory  assignment.  For  each  cross  term  i,i;  , 
we  can  find  the  processor  DMk  to  which  this  cross  term  is  assigned.  We  can,  then, 
set  the  flow  through  (mi,,  n*)  to  be  1.  Since  the  assignment  of  cross  terms  satisfies 


the  capacity  constraint  of  processors,  the  flow  through  arcs  into  the  sink  t  is  within 
the  capacity  of  these  arcs.  Therefore,  the  maximum  flow  is  \Eq\.  Contradiction. 

Thus,  Algorithm  3.1  is  proved. 

Phase  1  of  algorithm  1.1  takes  0[\Eq\M)  because  for  each  cross  term,  every 
node  should  be  checked  if  it  can  handle  the  cross  term.  Ford-Fulkerson  algorithm 
takes  0(|A|)  where  A  is  a  set  of  edges  of  the  graph  in  Fig  3-1.  |A|  is  no  greater 
than  \Eq\  +  M  +  \Eq\M.  Therefore,  for  our  problem,  Step  1  of  Phase  2  takes 
time  0{\Eq\M).  Since  the  maximum  flow  is  no  greater  than  \Eq\,  Phase  2  takes 
0{\Eq\2M).  Algorithm  1.1  has  running  time  0(\Eq\2M).  (  At  step  1  of  phase  2, 
any  max-flow  algorithm  can  be  used  as  long  as  they  generate  integral  solution.  If 
we  use  such  an  algorithm  other  than  Ford-Fulkerson,  the  time  complexity  may  be 
different.  ) 

We  can  extend  Algorithm  1.1  in  order  to  obtain  a  balanced  allocation  of  the 
global  cost  function,  given  a  fixed  processor  network.  Let  us  say  that  the  best- 
balanced  allocation  is  the  one  that  minimizes  the  load  of  the  most  heavily  loaded 
processor. 

Load  Balancing  in  a  Fixed  Structure 

Given  Gq  =  ( Vq,Eq )  and  a  graph  G  =  (V,E),  find  a  decomposition  that  minimizes 
maX{  nt{i). 

We  can  solve  this  problem  by  running  the  phase  2  of  algorithm  1.1  recursively. 


1.  Run  Line  1  through  Line  4  of  phase  1  of  Algorithm  3.1 
Set  nr  = 

2.  Create  the  sink  node  t  and  make  an  edge  from  each  n,  to  t. 

3.  Set  the  capacity  of  each  edge  (rt,-,t)  to  be  nt* . 

4.  Run  Ford-Fulkerson  labeling  algorithm  for  the  constructed  digraph. 

(Set  an  initial  feasible  flow  through  each  link  that  is  integer.  Each  flow  can  be 
set  zero  initially.) 

If  maximum  flow  is  |jEq|,  stop; 

(  min  maxi  nt{i )  =  nt*  ) 

If  maximum  flow  is  less  than  \Eq\,  nt*  :=  nt*  +  1 
go  to  Line  3. 

We  know  that  the  algorithm  terminates  before  nt*  becomes  greater  than  \Eq\ 
because  min  max,-  nt(i)  is  no  greater  than  |J5q|.  Therefore,  the  time  complexity 
of  Algorithm  3.2  is  0(\Eq\3M).  In  Step  4  of  Algorithm  3.2,  if  we  do  binary  search 
of  nt*  rather  than  increase  it  one  at  each  iteration,  we  can  run  Ford-Fulkerson 
algorithm  only  0(log  |2?q|)  times.  Therefore  the  complexity  of  Algorithm  3.2  can 
be  improved  to  0(M\EQ\log\Eq\). 

When  the  fixed  organizational  structure  happens  to  be  a  tree,  we  present  a 
special  algorithm  in  order  to  minimizes  max,nt(t). 

Fixed  Tree: 

Given  Gq  (a  graph  describing  a  global  cost  function)  and  T  =  (V,Et)  (a  tree  struc¬ 
tured  network  of  decision  makers),  find  a  decomposition  that  minimizes  maxx  nt(i). 

The  following  lemma  is  used  in  order  to  analyze  the  algorithm. 


Lemma  3-2 

Suppose  the  organizational  structure  is  a  tree,  and  DM,  is  responsible  for  x,. 

1.  If  DM{  and  DMy  are  directly  linked,  i.e.  (DMi,  DMj)  £  Ex,  the  cross  term 
x,x y  is  assigned  either  to  DMi  or  to  DMj. 

2.  If  DM,  and  DMj  are  connected  by  exactly  two  hops,  i.e.  there  exists  DM*  such 
that  (DMi,DMk),(DMj,DMk)  €  Ex  and  [DMi, DMj)  £  E,  X{Xj  is  assigned 
to  DM*. 

Proof 

1.  Suppose  DMi  and  DMj  are  directly  linked,  and  x/xy  is  assigned  to  DMk, 
k  i,j.  Jk  depends  upon  x,  and  Xy.  Therefore,  by  ‘Direct  Communication’ 
assumption  ,  (DMi,  DMk)  €  Ex  and  (DMj,  DMk)  £  Ex .  Therefore,  DMi, 
DMj ,  and  DMk  form  a  cycle.  Contradiction. 

2.  Suppose  DM,  and  DMy  are  connected  through  DMk •  If  x»xy  is  assigned  to 
DMi  or  DMj,  by  ‘Direct  Communication’  assumption,  (DMi, DMj)  €  ET- 
Therefore  DMi,  DMj,  DMk  form  a  cycle.  Contradiction.  If  x,xy  is  assigned  to 
DMi,  l  i,j,k.  By  ‘Direct  Communication’  assumption  ,  (DMi,  DMi)  £  Ex 
and  (DMj,DMi)  £  Ex-  Therefore,  DMi,  DMj,  DMk  and  DM/  form  a  cycle. 
Contradiction. 

Q.E.D. 

Algorithm  3.3 

Given  a  tree,  T  =  (V,Ex)  and  J(x\,  x2, ....,  xM), 

Step  1.  Check  if  the  given  tree  can  handle  the  cost  function.  In  other  words,  for  all 
cross  terms  x,xy,  check  if  DM/  and  DMy  are  within  two  hops. 

Step  2.  Assign  each  square  term  x\  in  Jl. 


For  each  cross  term  x,iy,  if  there  exist  DMk  such  that  {DMi,  DMk)  G  Et , 

{ DMj,DMk )  €  Et,  %iXj  is  assigned  in 

Step  3.  Let  us  define 

nec(t)  =  the  number  of  terms  necessarily  assigned  to  DMi  by  step  2. 
mx  =  maxi  nec{i) 

L{G),  leaves  =  a  set  of  nodes  of  a  graph  G  whose  degree  is  1. 

B(G)  =  a  set  of  edges  incident  upon  the  nodes  in  L(G) 

T(n)  =  (V(n),Er(n)) 

(T  if  n  =  1 

\  (  V{n  -  1)  -  L{T{n  -  1))  ,  ET{n  -  1)  -  B{T{n  -  1))  )  if  n  >  2 
For  iteration  n  from  1  to  r  where  T(r)  =  0,  do  the  following: 

for  all  pairs  t,j  such  that  DMi  €  L(T(n)),  DM}  £  T{n), 
and  {DMi,  DM ,)  6  E{n) 

a)  if  nec(t)  <  mx, 

x,xy  is  assigned  to  DMi 
nt(i)  =  nec(i)  +  1  ;  frozen 

b)  if  nec(t)  =  mx,  and  nec(j)  <  mx, 

x,xy  is  assigned  to  DMj 
nt{i)  =  nec(f)  ;  frozen 
nec(j)  :=  nec(j)  +  1 

c)  if  nec(t)  =  mx,  and  nec(j )  =  mx, 

XjXy  is  assigned  to  DM, 

nt{i)  =  nec(t)  +  1  =  mx  +  1  ;  frozen 

(  The  terminology  ‘frozen’  means  that  no  more  cross  term  can  be  assigned 
to  the  processor;  therefore,  nt{)  is  set.  ) 
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Explanation: 

Step  1.  For  some  trees,  no  decomposition  can  satisfy  our  assumptions.  See  Figure  3-2 
as  an  example. 

Step  2.  This  step  assigns  to  each  DM,  all  the  terms  that  J *  must  necessarily  have 
according  to  to  the  second  point  of  Lemma  3-2.  (At  the  end  of  Step  2,  we  are 
left  with  at  most  M— 1  cross  terms  to  assign.  These  cross  terms  are  characterized 
as  x,xy  such  that  (DM,,  DMj)  E  £7.  ) 

Step  3.  At  each  iteration,  we  take  leaves  and  their  neighbors  (A  leave  is  defined  as  a 
node  whose  degree  is  1)  and  eliminates  leaves  with  their  incident  edges.  The 
number  of  cross  terms  assigned  to  these  leaves  are  frozen  when  they  are  removed. 
Step  3  is  a  strategy  to  assign  cross  terms  corresponding  to  the  edges  of  the  tree. 

Theorem  3.3 

Algorithm  3.3  minimizes  maxi  nt{i),  for  a  fixed  tree  in  polynomial  time,  and  mi  < 
max,  nt(i)  <  mx  +  1 

Proof 

First,  we  claim  that  at  any  iteration  n,  nec(»)  <  mx  for  any  leaf  DMi  as  long  as 
nt(i)  has  not  been  frozen.  Suppose  not.  Let  DMi  be  the  first  node  in  the  progression 
of  Algorithm  3.3  such  that  DM,  is  a  leaf  and  nec(i)  >  mx.  Since  nec(A;)  <  mx  for 
all  processors  DM/,  before  Step  3,  the  only  way  it  can  happen  is  nec(i)  =  mx  at 
a  certain  time  n,  and  a  cross  term  x,x;  is  assigned  to  DM,.  DMi  must  be  a  leaf 
incident  on  DMi  when  this  happens,  and  this  DMi  is  frozen  as  the  assignment  of 
XiXj  is  made.  (Figure  3-3a)  At  this  time  n,  nec{l )  <  mx  because  we  defined  DM,  to 
be  the  first  node  such  that  nec  exceeds  mx  while  bing  a  leaf.  Since  nec(t)  is  mx,  and 
DMi  is  a  leaf  incident  upon  DM,,  XiX\  cannot  be  assigned  to  DMi.  Contradiction. 
Secondly,  if  no  incidence  of  Step  3-c  happens,  nec(k)  <  mx  for  each  processor 


DMk •  Therefore,  max*  nt(k )  =  mx. 

Finally,  if  Step  3-c  happens  at  some  iteration  n  (Figure  3-3b),  we  know  from 
the  first  claim  that  max  nt(k)  =  mx  + 1  at  the  end  of  the  algorithm.  We  claim  that 
min  maxk  nt(k)  is  indeed  mx  +  1  for  this  global  function.  Let  us  say  that  DM{  is 
a  leaf  with  nec(t)  =  mx,  and  DMj  is  its  neighbor  with  nec(j)  =  mx.  If  nec(t)  and 
nec(j)  are  both  originally  mx  before  Step  3  begins,  obviously  max nt(k)  >  mx+1, 
because  x,-Xj-  must  be  assigned  to  either  DM*  or  DMy  by  Lemma  3-2.  Therefore, 
max*  nt(k)  =  mx  +  1.  Even  though  nec(t)  <  mx  or  nec(j)  <  mx  before  Step  3,  we 
cannot  make  nec(i)  or  nec(j)  less  than  mx  at  time  n  with  keeping  nt(k)  <  mx  for 
all  k.  For  example,  if  nec(i)  <  mx  before  Step  3,  some  cross  terms  of  the  form  x^xj 
must  have  been  assigned  to  DMi  at  certain  iterations  before  n.  Take  an  arbitrary, 
such  /,  and  say  x^xj  has  been  assigned  to  DM,  at  iteration  n/  <  n.  Therefore,  at 
iteration  n/,  DMi  is  a  leaf  incident  on  DM,-,  and  nec{l )  =  mx  (Figure  3-3c).  If 
nec(Z)  is  mx  before  Step  3,  x^x/  must  be  assigned  to  DM,  as  long  as  we  try  to  make 
maxk  nt[k)  less  than  mx+  1.  If  this  is  the  case  for  all  l,  nec(t')  has  to  be  mx  at  time 
n  in  order  to  keep  nt(k)  <  mx  for  all  k.  If  for  some  /,  say  Zlt  nec(l j)  <  mx  before 
Step  3,  there  must  have  been  some  cross  terms  of  the  form  x;,xp  that  has  been 
assigned  to  DM;,  at  some  iteration  np  <  ni.  Therefore,  at  iteration  np,  DMp  is  a 
leaf  incident  on  DMj, ,  and  nec(p)  =  mx  (Figure  3-3d)  If  nec(p)  is  mx  before  Step 
3,  x/,xp  must  be  assigned  to  DM|(  as  long  as  we  try  to  make  maxk  nt(k)  less  than 
mx  -hi.  If  this  is  the  case  for  all  p,  nec(l)  has  to  be  mx  at  time  np  in  order  to  keep 
nt(k)  <  mx  for  all  k.  If  for  some  p,  say  pl,  nec(pi)  <  mx  before  Step  3,  there  must 
have  been  some  cross  terms  of  the  form  xPl  xq  that  has  been  assigned  to  DMPl  at 
some  iteration  nq  <  np,  and  nec(q)  =  mx  at  iteration  np.  If  we  keep  repeating  this 
argument,  we  will  eventually  get  to  a  node  DM,  such  that  nec(*)  is  mx  before  Step 
3,  because  the  graph  G  is  a  finite  tree.  Therefore,  nec(i)  must  be  mx  at  iteration  n 


in  order  to  keep  ni(k)  <  mx  for  all  k.  Therefore  max*  nt(k)  >  mx  +  1.  Since  we 
get  max*  nt(k)  =  mx  +  1  at  the  end  of  Algorithm  3.3,  max*  nt(k)  =  mx  +  1. 
Q.E.D. 

3.1.2  Flexible  organization 

Minimal  Link 

Given  Gq  =  [Vq,Eq),  ntf,  find  a  decomposition  that  minimizes  TL  such  that 

nt(i)<nt *,  i  =  1,2,...,  |VQ| 

Load  Balancing  with  Limited  Link 

Given  Gq  =  ( Vq,Eq ),  tl* ,  find  a  decomposition  that  minimizes  max,  nt(i)  such 
that  TL  <  TL*. 

The  recognition  version  (discussion  problem)  of  these  two  problems  are  identical. 

INSTANCE 
Gq  =  (Vq,Eq),M=\Vq\ 
tl,  total  number  of  necessary  links 
nt* ,  maximum  of  nt{i)  over  *  =  1,2,  ....,M 

PROBLEM 

Does  there  exist  a  decomposition  with  total  number  of  necessary  links,  tl 
and  max,  nt(i)  —  nt*  1 


No  efficient  algorithm  for  ‘Minimal  link’  or  ‘Load  Balancing  with  Limited  link’ 
has  been  found.  We  conjecture  that  the  recognition  version  of  these  problems  is 
NP-complete. 


An  efficient  algorithm  can  be  designed,  however,  for  a  special  case  of  ‘Minimal 
Link’  problem.  When  the  global  cost  function  is  J  =  xTQx,  where  Q  is  bandwidth 
limited  matrix,  a  dynamic  programming  can  be  used,  provided  that  some  additional 
constraint  is  imposed  on  the  assignment  of  existing  cross  terms. 

Minimal  Link  for  bandwidth  limited  cost 
Given  J  =  xTQx ,  where  Q  is  bandwidth  limited  by  a  fixed  integer  W ,  =  0 

if  |t  -j  \>W) 

find  a  decomposition  J  =  ^2kJk  that  minimizes  TL  subject  to  the  following  con¬ 
straints: 

Constraint  1  :  nt(i)  <  nt* 

Constraint  2  :  any  cross  term  x,x y  (t  <  j)  can  be  only  in  subcost  function  Jk, 
where  k  satisfies  j  —  W  +  1  <  k  <  j.  (  See  Figure  3-4  ) 

The  special  feature  of  this  problem  is  that  the  interaction  between  variables  is 
local.  If  two  variables  x,-,  xy  are  sufficiently  distant  (  |i  —  j\  >  W  ),  two  processors 
(or  divisions  of  an  organization)  responsible  for  these  variables  can  make  decisions 
independent  of  each  other.  Here  is  the  reason.  Without  loss  of  generality,  let  us 
assume  i  <  j.  For  any  cross  term  of  the  form  XiXw  in  the  cost  function  J,  |t  —  w|  <  W . 
If  *  <  LJ  <  j,  Xjiw  can  be  assigned  to  processors  DM^-w  +  u  DM^-w +2 
but  not  DM].  (Figure  3-5a)  If  w  <  i  <  j,  xwx,  can  be  assigned  to  processors 
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DMi-w+u  DMi-w+2>—,DMi,  but  not  DMj.  (Figure  3-5b)  Therefore,  no  cross 
term  of  the  form  xtx w  can  be  in  J3 .  By  the  same  token,  no  cross  term  of  the  form 
xyxw  can  be  in  J‘.  Therefore,  Jx  does  not  depend  upon  xy,  and  J3  does  not  depend 
upon  x,-.  Dynamic  programming  takes  advantage  of  this  feature  of  the  locality  of 
interaction. 

Decomposing  a  cost  function  can  be  viewed  as  assigning  cross  terms  to  proces¬ 
sors.  Let  us  break  up  this  labor  of  assigning  cross  terms  into  M  stages,  where  M 
is  the  number  of  variables.  At  each  stage  t  G  {1,2,3,  ...,Af},  we  assign  all  the  cross 
terms  of  the  form 

x/xt,  t  —  W  +  1  <  l  <  t. 

Because  of  Constraint  2,  these  cross  terms  cam  only  be  assigned  to 

DMt-w+i,  DMt-\y+ 2>  ••••>  ,DMt 

(Figure  3-4)  Therefore,  the  number  of  choices  for  a  decision  at  each  stage  is  at  most 
W(W  —  1)  ( W  processors  to  choose  for  each  of  at  most  W  —  1  cross  terms  of  the 
form  x/xt,  t  —  W  +  1  <  l  <  t  ).  This  gives  hopes  for  polynomial-time  dynamic 
programming  because  W  is  fixed.  A  state  at  each  stage  should  contain  the  number 
of  cross  terms  assigned  to  each  processor  up  to  that  stage.  A  state  also  includes  the 
location  of  communication  links  required  by  the  assignments  of  cross  terms  up  to 
that  stage. 

Let  us  describe  the  dynamic  programming  rigorously.  Let  the  vector  U(t)  G  Zw 
describe  the  number  of  terms  assigned  to  processors  £>Mt_w  +  i,  DMt-w+ 2>  -m  DMt 


until  the  beginning  of  step  t.  More  precisely, 


I  the  number  of  cross  terms 
assigned  to  DMt-w+i 

until  the  beginning  of  step  t  if  t  —  W  +  i  >  1 
0  if  t  -  W  +  i  <  0 


Notice  that  only  the  cross  term  of  the  form  xixa,  l  <  s  <  t  is  assigned  to 
processors  until  the  beginning  of  step  t.  This  kind  of  cross  term  cannot  be  assigned 
to  processor  DMt  from  constraint  2.  Therefore,  DMt  does  not  have  any  cross  term 
at  the  beginning  of  step  t.  Consequently,  Uw(t)  =  0  for  all  t  —  1,2 Also, 
since  no  assignment  has  been  made  at  the  beginning  of  step  1,  U(  1)  =  0.  (  Figure 
3-6  ) 

Let  the  symmetric  matrix  V(t)  G  {0,l}'Vxlv  describe  the  link  structure  neces¬ 
sary  for  the  assignment  done  until  the  beginning  of  step  t.  More  precisely,  V(t)  is  a 
upper  diagonal  matrix  such  that 
for  1  <  i  <  j  <  M 

1  if  link  (DMt-w+i,  DMt-w+j) 

is  necessary  for  cross  terms 

Vij(t)  =  assigned  until  the  beginning  of  step  t 

>•  0  otherwise 


Notice  that  ViW  =  0  for  i  =  1,2..,  IV  —  1,  because  cross  term  involving  xt  has  not 
been  assigned  until  the  beginning  of  step  t.  Also,  1^(0)  =0  because  no  assignment 
has  been  made  at  the  beginning  of  step  1,  so  no  link  is  necessary. 


Let  d(t)  €  {0,  l,2,...,W}(w  ^  describe  a  decision  made  at  stage  t. 


d(t)  = 


(  dy(t)  \ 


di(t) 


V  dw- i(t)  J 

This  decision  di(t)  is  where  to  assign  a  cross  term  of  the  form 


xt_w+ixu  t  =  1,2,...,W  -  1 

(0  if  xt-w+i^t  is  not  in  J 


if 


*  =  1,2,3, 


zt-w+i^t  is  assigned  to  DMt~w+k 
...,W  -  1  k  =  1,2,3,  ...,W 


Now  we  are  ready  to  write  an  equation  that  describes  the  state  evolution.  Let 
us  define  a  vector  C(d,(t))  6  Zw : 

Ci(d(t ))  =  the  number  of  components  having  the  number  i  in  d(t) 


Ci(d(t))  is  the  number  of  cross  terms  newly  added  to  DMt-w+i- 


The  evolution  of  U (t)  is 


Ui(t  +  1) 


'  Ui+l(t)  +  Ci(d(t))  for  i=l,2,...,W-l 
k Cw{d(t ))  for  i=W 


(  See  Figure  3-7  ) 

Vij[t  +  1)  tells  whether  there  is  a  link  between  DMt+i-w+i  and  £>Mt+  i-w+;  at  the 
beginning  of  step  t  +  1.  Let  us  consider  the  case  i  <  j  <  W  —  1,  first.  (See  Figure 
3-8a  )  If  there  is  to  exist  link  (DMt+i-w+i,  DMt+i_w+j),  j(t+l~w  +t)  should  have 


a  term  involving  xt+l_w+y,  or  j(t+1  VV+J)  should  have  a  term  involving  xt+1_w+,-. 


Since  cross  terms  of  the  form 


x/xt,  t  —  W  +  l  <  l  <  t 


are  assigned  at  step  t,  the  only  way  that  this  link  is  newly  added  at  step  t  is  the 
following: 

1)  Xt+i-w+i^t  is  assigned  to  DMt+i~w+j  or 

2)  xt+x-w+yzt  is  assigned  to  DMt+\-w+i  • 

Now,  let  us  consider  the  case  *  <  j  =  W  —  1.  (See  Figure  3-8b  )  In  this  case 
£  +  1  —  W  +  j  =  t.  If  there  is  to  exist  link  (DMt+i~w+i,  DMt),  j(t+l~w+')  should 
have  a  term  involving  xt,  or  J*  should  have  a  term  involving  xf+1_w+t.  Since  cross 
terms  of  the  form 

X[Xt,  t  —  W  l  <1  <  t 

are  assigned  at  step  £,  the  only  way  that  this  link  is  newly  added  at  step  t  is  the 
following: 

1)  xt+1_w+iXt  is  assigned  to  DMt+i~w+j  =  DMt  or 

2)  xjxt  is  assigned  to  DMt+i-w+i  for  some  t  —  W  +  l  <  l  <  t. 


For  i  <  j  =  W,  as  mentioned  before,  Vij(t  -f  1)  =  0. 
for  i  -  1,2  ,..,j  -  1 

Viii*  +  1) 

I7(«'+1)0>1)(0  v  (<*»•+ 1(0  =  J  +  1)  v  (d;  +  i(£)  =  *  +  1)  for  j  =  2  to  W  -  2 

{di+l(t)  =  W)  V  (djt(£)  =  t  +  1,  some  1  <  k  <  W  —  1)  for  j  =  W  —  1 
0  for  j=W 

where  V  is  Boolean  ‘OR’. 
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,  A. 


.  *«  . 


Let  us  define 


cost[V (f),  d[t)  )  =  the  number  of  newly  added  links  at  step  t 


then,  tl  =  YltLi  cost(V(t),  d(t)  ). 


( U(t ),  V (t)  )  is  a  state,  and  the  state  space  is 


X  =  {(U,  V)\ Ui<nti,  i  =  1,2, 


Now  we  can  describe  ‘Minimal  Link  for  bandwidth  limited  cost’  in  the  frame¬ 
work  of  dynamic  programming. 


Ui(t  +  1)  =  { 


Ui +,  (t)  +  Ci{d{t))  for  i=l,2,...,W-l 
lCw(J(<))  for  i=W 


V^lt  +  1) 

f  V  (di+i(t)  =  j  +  1)  V  (rfy+1(t)  =  t  +  1)  for  j  =  2  to  W  -  2 


=  < 


K+i (t)  =  W)V  (dk(t)  =  t  +  1,  some  1  <  k  <  W  -  1)  for  j  =  W  -  1 
0  for  j  =  W 

(C/,V)(1)=0,  {U,v){t)ex 


M 


minimize  ^cost(7(t),  d(t)  ) 


t=i 


We  can  solve  this  problem  using  the  dynamic  programming  algorithm. [10] 


If  we  restrict  TL*  to  be  M  —  1  (In  other  words  the  organizational  structure  is 
a  tree.  ),  some  analytical  statements  can  be  made  concerning  ‘Load  Balancing  with 
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Limited  Link’.  The  following  problem  represents  a  situation  where  am  organization 
with  M  agents  cannot  afford  to  build  more  than  M  —  1  communication  links. 

Load  Balancing  for  tree  structure 

Given  J  =  xTQx,  find  a  decomposition  that  minimizes  max,nf(i)  such  that 


TL  <  M  -  1. 
or  equivalently, 

Given  J  =  xTQx,  find  a  decomposition  that  minimizes  max,nt(t)  such  that 
TL  =  M  -  1. 

We  assume  that  Gq  is  connected.  Therefore,  it  immediately  follows  from 
Lemma  2-1  that  the  resulting  graph  G  must  be  connected  for  any  decomposition. 
Therefore,  the  constraint  TL  <  M  —  1  is  equivalent  to  the  constraint  TL  —  M  —  1. 

Some  analytical  properties  of  these  problems  have  been  studied. 

Definition  3-4 

A  centralized  tree  ,  Tc  =  {V,E),  |V|  —  M  is  a  tree  that  has  one  node  DMi  with 
d(DMi)  =  M  —  1  and  M  —  1  nodes  with  degree  1.  We  call  the  node  with  degree, 
M  —  1  center. 

Lemma  3-5 

For  a  graph  G  =  (V,  E),  if  all  the  pairs  of  nodes  in  V  are  within  two  hops  from  one 
another,  and  G  has  no  cycle,  it  forms  a  centralized  tree. 

Theorem  3-6 

In  the  ‘Load  Balancing  for  tree  structure’  problem,  if  gQ  has  a  clique  of  size  c, 
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maxi  nt(i)  >  ^g1). 

Proof 

1)  When  the  whole  graph,  Gq  is  a  clique  of  size  c  =  M: 

any  pair  of  nodes  of  G  have  to  be  within  two  hops  from  each  other.  (Lemma 
2-1)  Therefore,  the  organizational  structure  is  a  centralized  tree.  (  Lemma  3-5 
)  Let  DMi  be  the  center,  then  all  the  (c  ”*)  cross  terms  between  c  —  1  neighbors 
must  be  assigned  in  J*.  (  Lemma  3-2  )  Therefore,  n£(t)  >  (C'I1)  Q.E.D. 

2)  When  Gq  has  a  clique  of  size  c  <  M: 

let  S  be  a  subgraph  of  G  that  consists  of  c  nodes  corresponding  to  the  clique. 
If  all  the  nodes  in  S  are  within  two  hops  from  one  another,  the  argument  is 
reduces  to  1).  If  there  exists  a  pair,  DMi,  DM j  in  S’  such  that  DMi  and  DM, 
are  connected  through  DMk  not  in  S,  all  c  nodes  in  5  must  have  a  direct  link 
to  DMk\  otherwise,  a  node  not  having  a  direct  link  to  DMk  has  more  than  two 
hops  to  either  DMi  or  DMj .  Therefore,  DMk  is  a  center  of  a  centralized  tree 
of  size  c  +  1.  Jk  must  have  at  least  (j)  Q.E.D. 

An  interesting  observation  about  this  problem  is  the  following: 

Given  Gq  and  the  constraint,  TL  =  M  —  1,  it  is  possible  for  a  tree  that  is  not  a 
subgraph  of  Gq  to  minimize  max*  nt{i)  over  all  possible  trees. 

This  is  illustrated  in  Figure  3-9.  This  fact  indicates  that  an  organization  can  mini¬ 
mize  the  load  of  the  highest-loaded  division  under  a  strong  constraint  on  the  number 
of  communication  links  by  introducing  a  link  between  divisions  not  directly  influ¬ 
encing  one  another. 

Let  us  now  observe  some  theoretical  results  for  special  cost  functions.  The 
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following  lemma  will  be  used  many  times  for  analytical  discussion  of  ‘Load  Balancing 
for  tree  structure’. 

Lemma  3-7 

Suppose  the  global  cost  function  J  has  cross  terms  X/Xy,  xyx*,  and  x^x,-.  If  there 
exists  a  link  between  DM,  and  DM/,  and  between  DMy  and  DM/  in  the  graph  G, 
l  ^  i,j,k,  (Figure  3-10a)  cross  terms  x.-Xfc  and  xyx*  must  be  assigned  to  DM/  as 
well  as  cross  term  x,xy.  Therefore,  a  link  between  DM/  and  DM*  is  necessary. 

Proof 

From  Lemma  3-2,  x,xy  must  be  assigned  to  DM/. 

Suppose  x,Xfc  is  assigned  to  DM,-  or  DM/t,  then  there  exists  a  link  between  DM/ 
DM//.  Therefore,  DMy  and  DM*  are  three  hops  apart  because  there  is  a  unique 
path  between  any  pair  of  nodes  in  a  tree.  (Figure  3- 10b  )  This  contradicts  Lemma 
2-1  because  Q{j,k)  ^  0. 

Suppose  x,Xfc  is  assigned  to  DMj.  Then,  there  exists  a  link  between  DM,  and  DMy 
by  ‘Direct  Communication’  assumption,  so  DM,,  DMj ,  DM/  form  a  loop.  This 
contradicts  the  tree  constraint. 

Suppose  x/x*  is  assigned  to  DM/»,  l'  t,l'  #  j,V  ^  l.  In  this  case,  there  exists  a 
link  between  DM/  and  DM/»,  DMk  and  DM/».  Therefore,  DM,  and  DMk  are  f°ur 
hops  apart.  (  Figure  3-10c  )  This  again  contradicts  Lemma  2-1.  Therefore,  x,xjt 
must  be  assigned  to  DM/. 

Q.E.D. 

Let  the  global  cost  function  be  described  by  Gq  =  {Vq,Eq).  We  discuss  the 
cases  of  special  cost  functions  such  that  Gq  has  a  subgraph  S  =  (Vs,Es)  of  the 


form 

Vs  =  {t,l,2, . n}  (3-1) 

Es  =  {(»,j)|y  =  1,2, ...,n}  U  {(j,j  +  l)|j  =  1,2, n  —  1}  (3  -  2) 

(  See  Figure  3-11  ) 

Lemma  3-8 

Assume  that  Gq  has  the  structure  introduced  by  equation  (3-1),  (3-2).  For  j  = 
1,2,  ...,n  —  1,  if  x,xy  is  assigned  to  DMk,  {k  ^  i  and  k  ^  j ,  j  =  l,2,..,n)  then, 
x,X(J+1)  must  also  be  assigned  to  DMk. 

For  j  =  2 if  x,Xj  is  assigned  to  DMk ,  (&  ^  *  and  ^  j ,  j  =  1,2,  ..,n)  then, 
XjX(y_i)  must  also  be  assigned  to  DMk. 

Proof 

i)  For  j  =  2,  ...,n,  Since  x,xy  is  in  DMk,  there  exists  a  link  between  DM,  and 
DMk,  DM j  and  DMk  in  G,  the  resulting  organizational  structure  (Direct  Com¬ 
munication  assumption).  Therefore,  by  Lemma  3-7,  x,xJ+i  must  be  assigned 
to  DMk. 

ii  For  j  =  1,2,  ...,n  —  1,  the  same  argument  shows  that  x,x;_i  must  be  assigned 
to  DMk ■ 

Theorem  3-9 

Assume  that  Gq  has  the  structure  introduced  by  equation  (3-1),  (3-2).  If  x,x}  (  j  is 
any  of  {1,2,  ...,n}  )  is  assigned  to  DMk  where  k  ^  t,k  ^  j,j  =  l,2....,n,  cross  terms 
X{Xj ,  j  =  l,2,..,n  are  assigned  to  DMk.  The  resulting  structure  of  an  organization 
must  contain  the  following  edges: 

{{DMk,  DMi)}  U  {(DMk,DMj)\j  —  1,2,  ...,n} 
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By  recursively  applying  the  result  of  Lemma  3-8,  we  can  easily  see  that  x^xy  must 
be  assigned  to  DMk  for  all  x,xy  j  =  l,2,...,n  if  one  of  them  is  assigned  to  DMk- 
Therefore,  from  ‘Direct  Communication*  assumption,  the  resulting  graph  must  con¬ 
tain 

{{DMk,  DM,)}  U  {{DMk,DMj)\j  =  1,2, 

Theorem  3-10 

Assume  that  Gq  has  the  structure  introduced  by  equation  (3-1),  (3-2).  Suppose 
there  is  a  link  between  DMi  and  DMj  for  some  j  <  n  in  G.  (x^xy  must  be  assigned 
either  to  DMi  or  DMj.  ) 

1)  If  XjXj+ 1  is  assigned  to  DMj,  cross  terms,  XiXj+i,  x,Xj+ 3,...,  x,xn  must  all  be 
assigned  to  DMj. 

2)  If  x,iy_i  is  assigned  to  DMj,  cross  terms,  x,x;_ 2,  x,xy_ 3,...,  x,xi  must  all  be 
assigned  to  DMj. 

Proof 

1)  Since  x,xJ+i  is  assigned  to  DMj,  there  is  a  link  between  DMj  and  DMj+i. 
(See  Figure  3-12)  The  global  cost  J  has  XjXJ+2,  xy+1xy+2)*j+2*i- 
Thus,  by  Lemma  3-7,  x,x;+2  must  be  assigned  to  DMj 

Consequently,  there  exists  a  link  between  DMj+ 2  and  DMj.  By  using  the  same 
argument  recursively,  we  can  show  that  x,iy+3,....,x,xn  must  all  be  assigned  to 
DMj. 


2)  Same  proof  as  1)  in  reverse  direction 


Now  we  observe  the  case  where  Gq  =  ( Vq,Eq )  is  of  the  form 


VQ  =  {  *,l,2,3,..,n  } 

(3-3) 

{{DMi,DMj)\j  =  1,2, 3...,n}u 

{ {DMj ,  DMj+ 1 ) \j  =  1,2,3,  ..,n-  l}u 

(3-4) 

{{DMn,DMi)} 

(  Figure  3-13  ) 

The  results  of  Lemma  3-8  and  Theorem  3-9  cam  be  directly  applied  to  this  case 
because  the  graph  S  =  ( V§,Es )  described  in  eqn  3-1  and  eqn  3-2  is  a  subgraph  of 
Gq  described  by  eqn  3-3  and  eqn  3-4.  Theorem  3-10  can  be  modified  as  follows: 

Theorem  3-11 

Consider  Gq  described  by  eqn  (3-3)  and  (3-4).  Suppose  there  is  a  link  between  DMi 
and  DMj  for  some  1  <  j  <  n  in  G.  (x^Xy  must  be  assigned  either  to  DMi  or  DMj. 
)  If  x,xy»,  where  (j,j')  €  Eq,  is  assigned  to  DMj ,  all  the  cross  terms  of  the  form 
x,x/  l  =  l,2..,n,  must  be  assigned  to  DMj. 

Proof 

For  the  sake  of  concise  explanation,  let  us  define 

rn(j)  =  (j  +  1)  mod  n  ( meaning  ‘ right  neighbor1) 

ln(j)  =  ( j  —  1)  mod  n  ( meaning  Heft  neighbor ’) 

then,  j ’  —  rn(j)  or  ln(j). 

i)  Suppose  XjXfc  is  assigned  to  DMj  for  any  k  ±  ln{j)-  We  claim  that  xjtxrn(fc) 
must  also  be  assigned  to  DMj.  From  ‘Direct  Communication’  assumption, 
there  exists  a  link  between  DMj  and  DMk ■  (See  Figure  3-14)  The  global  cost 


function  J  has  ZfcXrn(jfc)>  xm(k)xi-  Therefore,  by  Lemma  3-7,  z,zrn(fc) 

must  be  assigned  to  DMj. 

If  x iXrn(j)  is  assigned  to  DMj ,  all  the  cross  term  of  the  form 

xtz/,  /  =  (j  +  2)  mod  n  ,1  =  (j  +  3)  mod  n  , ...,/  =  (j  +  n  —  1)  mod  n 
are  assigned  to  DMj  by  induction. 

ii)  Suppose  Zjifc  is  assigned  to  DMj  for  any  fc  ^  rn(j).  We  can  show  that  z^i in^ 
must  also  be  assigned  to  DMj  by  the  same  argument  in  the  opposite  direction. 
Therefore,  by  induction,  if  XjX jrt(p  is  assigned  to  DMj ,  all  the  cross  term  of  the 
form 

2,-xj,  /  =  (j —  2)  mod  n  ,1  =  (j  —  3)  mod  n  , ...,/  =  (j  —  n  +  1)  mod  n 

are  assigned  to  DMj. 

Q.E.D. 

From  this  theorem  we  can  also  conclude  the  following: 

If  there  is  a  link  between  Z?M,-  and  DMj  for  some  1  <  j  <  n  in  G,  and  X{Xj> , 
where  [j,  j')  €  Eq,  is  assigned  to  DMj,  the  resulting  graph  G  is  a  centralized  tree 
with  the  center  DMj. 

Lemma  3-12 

If  XiXj  is  assigned  to  DMj> ,  j’  ^  i,j ,  the  resulting  graph  G  is  a  centralized  tree  with 
the  center  DMj>. 

Proof 

i)  j'  =  ln(j )  or  rl(j): 

There  must  be  a  link  between  DM,  and  DMj>  by  l  irect  Communication’ 


assumption,  and  x,xy  is  assigned  to  DMj>.  Therefore,  by  Theorem  3-11,  all  the 
cross  term  of  the  form 

l  ~  l,2,...,n 

are  assigned  to  DMji.  Therefore,  the  resulting  graph  is  a  centralized  tree  with 
the  center  DMy. 
ii)  j'  ?  ln(j )  and  j'  ^  rl(j): 

Since  x^xy  is  assigned  to  DMj> ,  there  is  a  link  between  DM*  and  DMj>,  and 
DMj  and  DMj>. 

The  global  cost  function  J  has  x^xy,  xyxrn(y),xrn(y)X,-,  so  by  Lemma  3-7  x^xrn(y) 
must  be  assigned  to  DMj>.  Therefore,  there  must  be  a  link  between  DMj>  and 
DMrn(jy  By  recursively  applying  Lemma  3-7,  we  can  see  that  all  the  cross 
term  of  the  form 

X{Xi,  l  =  (j  +  2)  mod  n  ,1  =  (j  +  3)  mod  n  , =  In(j') 
are  assigned  to  DMj>. 

The  global  cost  function  J  has  x,xy,  XyXjn(y),X{n(y)Xi,  so  by  Lemma  3-7  Xji/n(;) 
must  be  assigned  to  DMj>.  Therefore,  there  must  be  a  link  between  DMy  and 
DMin(jy  By  recursively  applying  Lemma  3-7,  we  can  see  that  all  the  cross  term 
of  the  form 

x,X(,  l  =  [j  —  2)  mod  n  ,1  —  ( j  —  3)  mod  n  =  rn(j') 

are  assigned  to  DMy. 

Therefore,  the  resulting  graph  is  a  centralized  graph  with  the  center  DMy. 
Q.E.D. 

Theorem  3-13 

For  a  global  cost  function  described  by  Gq  in  eqn  (3-3)  and  (3-4),  with  n  >  3,  the 


optimally  balanced  decomposition  under  ‘tree  constraint’  gives 

mtntree  rnaii  nt(i)  =  n 

and  the  resulting  graph  is  a  centralized  tree  with  the  center  DM,-. 

Proof 

i)  Suppose  for  some  j  £  {1,2,  ...,n},  x,xy  is  assigned  to  DM *,  k  ^  t,j.  By  Lemma 
3-12,  the  resulting  organizational  structure  is  a  centralized  tree  with  the  center 
DMk.  (Figure  3-15a)  For  all  l  ^  k,i,  DMi  and  DMi  are  two  hops  apart,  and 
DM*  is  the  connector.  Therefore,  by  Lemma  3-2,  n  —  1  cross  terms  of  the  form 
X{X i,  l  ^  k,i ,  must  necessarily  be  assigned  to  DMk*  Also,  for 

/  =  (k  +  1)  mod  n,  ( k  +  2)  mod  n, (k  +  n  —  2)  mod  n 

DMi  and  DMrn ^  are  two  hops  apart,  and  DMk  is  the  connector.  Therefore, 
cross  terms  of  the  form  x/xrn must  be  assigned  to  DMk  fo r 

/  =  (fc  +  1)  mod  n,  ( k  +  2)  mod  n, ...,  (k  +  n  —  2)  mod  n 

Consequently  Jk  must  have  at  least  2n  —  3  cross  terms.  Thus,  nt(k)  >  2n  —  3. 

ii)  Suppose  for  all  j  =  l,2,...,n,  xtxy  is  assigned  to  or  DMj.  By  Lemma 
3-1,  there  must  exist  a  link  between  DMi  and  DMj  for  j  =  l,2..,n  Therefore, 
the  resulting  organizational  structure  is  a  centralized  tree  with  the  center  DM,. 
(Figure  3-15b)  For  all  j  —  1,2 ,..,n,  DMj  and  DMrn(,)  are  two  hops  apaxt, 
and  DMi  is  a  connector.  Therefore,  by  Lemma  3-2,  cross  terms  of  the  form 
Xj xrn(y) ,  j  —  1,2,  ..,7i  must  be  assigned  to  DMi.  If  we  assign  cross  terms  of  the 
form  XiXj  to  DMj  for  j  =  1,2,  ..,n,  J'  ends  up  having  n  cross  terms,  and  J 3 
ends  up  having  one  cross  term,  for  j  =  1,2,  ..,n. 


I 

I 

I 


Therefore, 


nt(t')  =  n 


max  nt  =  n 


From  i)  and  ii) 


min  max  nt  =  n 


Q.E.D. 


Concerning  analytic  discussions  of  the  relationship  between  flexible  organiza¬ 
tional  structure  and  its  global  cost  function,  there  is  a  lot  of  room  for  research.  The 
following  open  questions  axe  suggested  for  future  research. 

Does  there  always  exist  a  tree  contained  in  Gq  which  minimizes  maxtnt(i)  over  all 
trees? 

As  for  an  organizational  task,  what  more  can  be  said  about  Gq  in  between  two 
extremes  of  strong  connectivity  and  diagonality? 

How  can  we  define  a  measure  of  intricacy  of  the  task?  One  idea  of  defining  a  coupling 
measure  is 

_ 1 _ 

max(ij)  shortest  path  between  i  and  j  in  Gq 

We  want  to  examine  other  ways  of  defining  a  measure  of  coupling.  We  expect  that 
min  maxi  nt{i)  increases  as  a  measure  of  coupling  in  the  task  increases  under  a 
constrained  tl.  We  also  want  to  develop  3ome  criteria  according  to  which  one  can 
check  a  cost  function  (represented  by  Gq)  to  see  if  decentralized  computation  results 
in  better  speed  of  convergence. 
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3.1.3  Semi-flexible  organization 


So  far,  we  have  implicitly  assumed  that  DM,  is  in  charge  of  the  value  of  x<. 
We  are  also  interested  in  the  following  problem  where  the  organization  designer  has 
freedom  to  mandate  which  division  (  DM' s  )  is  responsible  for  which  decision  (x,’s), 
but  communication  structure  of  an  organization  is  fixed.  Like  Fixed  organization, 
TL  is  fixed  in  this  case.  The  problem  is  to  find  an  assignment  of  decision  variables 
and  a  subcost  functions  to  processors  that  minimize  max,  nf  (t). 

Mapping  for  Load  Balancing 
Given  Gq  =  (Vq,Eq)  and  G  =  (V,E),  find 
a  decomposition  J(x )  =  an<* 

a  matching  between  {xx,x2, and  V, 
so  that  these  minimize  mazt  nt(i). 


This  problem  is  a  hybrid  of  two  problems;  mapping  strategy  for  parallel  pro¬ 
cessing  [11]  and  load  balancing.  Once  a  matching  between  {xx,X2, ...,  £jvr}  and  V 
is  determined,  the  problem  is  reduced  to  ‘Load  Balancing  in  a  fixed  structure’  in 
section  3.1.1.  Let  us  break  up  this  problem  into  two  parts;  1)  matching,  2)  decom¬ 
position  of  J.  In  matching  part,  if  we  assume  |V|  =  |Vq|,  we  can  think  of  |Vq|! 
matchings.  Because  of  ‘Direct  Communication’  assumption,  we  have  to  elect  from 
these  matchings  ones  having  the  following  property:  if  (*,j)  6  Gq,  o(x,)  and  a(x;) 
are  within  two  hops  in  G.  If  we  run  algorithms  presented  in  section  3.1.1  for  each  of 
these  matchings,  and  single  out  the  one  with  smallest  min  max,  nt(i),  ‘Mapping  for 
Load  Balancing’  will  be  solved.  Therefore,  the  focus  should  be  on  how  to  efficiently 
elect  matchings  with  the  property  specified  above.  First,  let  us  consider  the  question 
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of  whether  such  a  mapping  exists. 

Existence  of  a  feasible  Mapping 

Given  Gq  =  ( Vq,Eq )  and  G  =  {V,E),  does  there  exist  a  one-to-one  mapping 

a  :  — >V 

such  that 

o{xi)  and  er(xj)  are  within  two  hops  from  each  other  in  G,  V(t,j)  E  Eq. 

Here  is  another  way  of  viewing  ‘Existence  of  a  feasible  Mapping’  in  the  frame¬ 
work  of  graph  theory.  Let  us  define  a  transformation  of  the  graph  G  =  (V,  E). 

F(G)  =  ( V,E ,) 

where 

Ef  —  E  U  {(DMi,DMj)  E\DM{  and  DMj  are  two  hops  from  each  other} 

(  See  Figure  3-16  as  an  example  )  ‘Existence  of  a  feasible  mapping’  is  equivalent 
to  whether  Gq  can  be  embedded  in  F(G).  If  there  is  a  one-to-one  mapping  a  in 
‘Existence  of  a  feasible  mapping’  problem,  each  node  i  of  Gq  can  be  embedded  in 
cr(xi)  in  .F(G).  For  each  pair  [i,j)  €  Eq  <7(x,)  and  <7(1  y)  are  within  two  hops  from 
each  other  in  G,  so  (<r(x,),er(x;)  )  E  F[G).  Therefore,  Gq  can  be  embedded  in  F(G). 
Conversely,  if  Gq  is  embedded  in  F(G),  we  can  construct  a  one-to-one  mapping  a 
such  that  o(xi)  is  the  node  in  which  *  €  Vq  is  embedded.  Then,  for  all  edge 
(i,j)  E  Gq,  (o-(xi),a(xJ))  E  F(G)  because  Gq  is  embedded  in  F(G).  Therefore, 
cr(ii)  and  c(xj)  axe  in  two  hops  from  each  other  in  G. 

The  problem  of  determining  whether  Gq  can  be  embedded  in  F(G)  is  a  special 
case  of  ‘Subgraph  Isomorphism’  problem.  [9] 


Subgraph  Isomorphism 
INSTANCE  :  Graphs  H x  =  (V^),  H2  =  ( V2,E2 ) 

QUESTION  :  Does  Hi  contain  a  subgraph  isomorphic  to  Hi7. 

If  we  restrict  Hi  to  be  a  set  of  graph 

{graph  H  \  3  a  graph  T  such  that  F(r)  =  H} 

this  restricted  version  is  equivalent  to  ‘Existence  of  a  feasible  Mapping’.  (  Note  that 
some  graphs  do  not  belong  to  this  set.  For  an  example,  see  the  Appendix.  ) 

Though  ‘Existence  of  a  feasible  Mapping’  is  equivalent  to  a  subproblem  of  ‘Sub¬ 
graph  Isomorphism’,  ‘Existence  of  a  feasible  Mapping’  is  very  similar  in  structure  to 
the  original  ‘Subgraph  Isomorphism’,  which  is  known  to  be  NP-complete.  There¬ 
fore,  we  conjecture  that  this  problem  is  NP-complete.  Consequently,  we  conjecture 
that  ‘Mapping  for  Load  Balancing’  is  also  NP-complete. 


3.2  RL  as  an  amount  of  communication 


When  RL  is  used  as  a  measure  of  the  amount  of  communication,  task  allocation 
in  a  flexible  organization  becomes  tractable.  Also,  it  turns  out  that  task  allocation 
in  a  flexible  organization  is  a  special  case  of  task  allocation  in  a  fixed  organization. 
Therefore,  the  case  of  flexible  organization  will  be  presented  first,  and  the  idea  used 
for  this  case  will  be  generalized  for  the  case  of  fixed  organizational  structure. 


3.2.1  Flexible  organizational  structure 


Minimal  Superposed  Link 

Given  J  =  xTQx,  nt*,  find  a  decomposition  that  minimizes  RL  such  that  ni(t)  < 
nt\,  x  =  1,2,  ...,  M. 

If  communication  load  is  defined  by  RL  (  section  2.2.3  ),  minimization  of  com¬ 
munication  load  is  done  by  a  special  binary  integer  linear  programming.  Let 
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V  =  {DMuDM2,DM3,....,DMm} 


Eq  =  {unordered  pair  (t,j)r  /<e  cross  term  x,x;  is  in  J  } 


f  1  if  x,x,-  is  assigned  to  DM* 
1 0  otherwise 


Let  the  variable 


Since  each  cross  term  is  assigned  to  one  processor, 


Let  the  capacity  of  processor  DM{  be  nt*  (the  number  of  cross  terms  it  can  handle). 
The  number  of  cross  terms  assigned  to  DMk  must  be  within  its  capacity,  so 

yi  x*jk  <  nti 

For  a  variable  Xijk  =  1,  if  k  is  t  or  j,  one  link  is  introduced;  namely  a  link  between 
DM{  and  DMj.  If  A:  is  neither  t  nor  j,  two  links  are  introduced;  namely  a  link  be¬ 
tween  DMi  and  DMk  and  a  link  between  DMj  and  DMk ■  (section  2.2.3)  Therefore, 
communication  load,  RL  is 


£  2  X<jk  +  'EX<jk  +  '£xijk 

fc=*  k—j 

In  summary  the  decomposition  problem  is  reduced  to  the  following  binary  integer 
linear  programming  problem.  We  call  this  problem  binary  integer  linear  program¬ 
ming  problem  iBIL\ 

minimize  xijk  +  Zk=i  XHk 

such  that 

Eiit  Xijk  =  1  V(.,j)  €  eq 

Xijk  <  . (BIL) 


; 

I 
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Without  the  last  constraint,  which  is  the  integrality  constraint,  this  linear  pro¬ 
gramming  problem  is  a  linear  network  problem.  In  fact  this  problem  can  be  formu¬ 
lated  as  a  ‘minimum  cost  network  flow’  [6]  problem.  Figure  3-17  shows  a  minimum 
cost  network  flow  problem  that  is  equivalent  to  BIL.  Each  node,  mtJ-  corresponds  to 
a  pair  (t,  j)  €  Eq.  Each  node,  rt*  corresponds  to  DM{  £  V .  Minimum  cost  network 
flow  problems  can  be  easily  (polynomially)  transformed  to  Hitchcock  problems. [6] 
Figure  3-18  shows  how  to  transform  our  minimum  cost  flow  problem  to  a  Hitchcock 
problem.  In  Hitchcock  problem  a  feasible  set  is  represented  by  a  system  of  linear 
equations,  and  the  coefficients  of  those  linear  equations  form  a  node-arc  incidence 
matrix.  A  node-arc  incidence  matrix  is  ‘totally  unimodular’  [6],  Therefore,  optimal 
vertices  have  integer  elements.  Consequently,  a  simplex  algorithm  produces  an  inte¬ 
ger  optimal  flow  of  the  network,  so  a  simplex  algorithm  solves  BIL.  We  can  expect 
any  variation  of  simplex  algorithm  to  solve  BIL  as  long  as  feasible  points  move  from 
vertex  to  vertex.  We  now  explicitly  show  that  some  well-known  algorithms  can  be 
applied  to  to  solve  BIL. 


Primal-Dual  algorithm 

Primal-Dual  algorithm  [6]  can  be  run  to  solve  the  minimum  cost  network  flow  prob¬ 
lem  corresponding  to  BIL  (e.g.  Figure  3-17).  Let  us  call  the  network  N.  Let  / 
be  a  vector  that  indicates  the  flow  of  the  network,  and  let  be  a  flow  through 

a  directed  arc  (*, j).  For  a  feasible  flow,  /,  Primal-Dual  algorithm  [6]  constructs 
an  increment  network,  for  the  current  feasible  flow  by  changing  arcs  of  N 

as  follows:  for  each  directed  arc  with  cost  c,  add  a  directed  arc 
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with  capacity  [0,/(m,y,  &)].  and  with  cost  — c.  (Figure  3-19a)  For  each  arc  (n,-,f), 
change  the  capacity  to  [0,  nt J  —  /(ni,f)]  and  add  a  directed  arc  (t,nt)  with  a  ca¬ 
pacity  [0,  /(nt-,i)].  (Figure  3-19b)  At  each  iteration,  Primal-Dual  algorithm  finds 
a  negative-cost  cycle  in  N'  and  increases  flow  through  arcs  of  this  cycle  by  the 
minimum  over  the  upper  bounds  of  arcs  in  the  negative  cycle. 

We  claim  that  if  we  choose  an  integer  initial  flow,  the  algorithm  will  produce 
an  integer  optimal  flow. 

Proof 

Let  /  be  a  current  flow  that  is  integral.  Since  the  bound  of  flow  along  any  arc 
of  the  network  (e.g.  Figure  3-17)  is  integer,  a  bound  of  flow  along  any  arc  in  the 
increment  network  N'(f )  is  integer.  Therefore,  if  a  negative  cycle  exists  in  N'[f), 
the  incremental  flow  /  is  integer.  This  incremental  flow  is  along  the  arcs  of  that 
negative  cycle.  For  the  arcs  that  is  not  in  the  negative  cycle,  their  flow  does  not 
change.  Therefore,  the  new  flow  is  also  integer.  By  induction,  all  the  flows  in 
sequence  are  integers. 

The  integer  flows  of  minimum  cost  network  flow  problem  correspond  to  feasible 
assignments  of  cross  terms.  The  cost  of  flow  corresponds  to  the  communication 
load  required  by  the  assignments  of  cross  terms.  Therefore,  if  we  manage  to  have 
a  feasible,  integer  initial  flow,  we  can  obtain  optimal  decomposition  by  running 
Primal-Dual  algorithm  for  the  network  exemplified  by  Figure  3-17.  Now  the  focus 
of  discussion  should  be  on  how  to  obtain  a  feasible,  integer  initial  flow.  We  can 
obtain  a  feasible,  integer  initial  flow  by  converting  the  network  like  Figure  3-17  into 
a  single-source,  single-sink  network.  Add  a  source  node,  s  to  the  network  like  Figure 
3-17  and  construct  a  link  from  s  to  each  node  m,y.  Let  the  capacity  of  these  links 
be  [0,  lj  (  Figure  3-20  )  By  running  max-flow  algorithm  like  Algorithm  3.1  on  this 


single-source,  single-sink  network,  we  can  obtain  integer  initial  flow  through  arcs  of 
type  ( mij,nk )  and  (nk,t). 


Load  Balancing  with  Limited  Superposed  Link 
Given  J  =  xTQx,  RL* ,  find  a  decomposition  that  minimizes  max,  nt(i)  such  that 
RL  <  RL*. 

The  idea  used  for  ‘Load  Balancing  in  a  Fixed  Structure’  (section  3.1.1)  can  be 
used  for  this  problem.  The  smallest  max,-  nt(t)  we  can  possibly  have  is  [  where 

|  JE7q  |  is  total  number  of  cross  terms  in  J,  and  M  is  the  number  of  processors.  For  the 
most  unbalanced  decomposition,  maxi  nt(i)  is  \Eq\  (  if  the  cross  terms  are  assigned 
to  a  single  processor  ).  Initially,  we  set  the  capacity  of  all  the  axes  of  type  ( nk,t ) 
to  be  and  minimize  RL.  Minimizing  RL  is  solving  ‘Minimal  Superposed 

Link’,  so  we  can  use  Primal-Dual  algorithm.  If  we  obtain  RL  <  RL* ,  we  achieve 
min  max,nf(*‘)  ;  otherwise,  increase  the  capacity  of  arcs  [nk,t)  by  one.  If  we  do  not 
obtain  RL  <  RL*  until  the  capacity  of  these  arcs  becomes  \Eq\,  given  instance  of 
the  problem  is  infeasible.  In  other  words,  no  matter  how  we  decompose  J,  we  need 
more  than  RL*  for  amount  of  communication.  The  following  algorithm  summarizes 
our  discussion: 

Algorithm  3.4 

1.  Construct  a  network  corresponding  to  BIL,  where  nt*k  =  dummy  for  k  = 
1,2  ,..,M 

2.  Set  dummy  = 
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3.  Do  while  dummy  <  \Eq\ 

Run  Primal-Dual  algorithm 

If  RL  <  RL* ,  min  max{  nt{i)  =  dummy ;  terminate 
dummy  :=  dummy  +  1 

4.  Return  “Infeasible” 


3.2.2  Fixed  organization 

In  previous  sections  the  amount  of  communication  was  defined  as  the  total 
number  of  message  transfers,  where  a  processor  must  transmit  one  message  for  each 
cross  term  that  is  assigned  to  another  processor,  and  which  involves  its  variable. 
A  flexible  organizational  structure  was  assumed.  It  means  that  the  task  allocator 
has  an  authority  to  construct  a  link  between  any  processors  if  necessary.  If  one  or 
more  messages  need  to  be  transferred  in  the  previous  section,  the  allocator  can  freely 
put  links  between  those  processors.  In  this  section,  the  case  of  fixed  organizational 
structure  is  discussed.  The  links  between  processors  are  fixed;  therefore,  the  allocator 
or  the  allocation  algorithm  must  decompose  the  global  cost  function  such  that  the 
message  might  be  transferred  only  through  existing  links.  Under  this  constraint  we 
want  to  minimize  the  amount  of  necessary  communication  defined  by  RL. 

Minimal  Superposed  Link  in  a  Fixed  Structure 
Given  a  graph  G,  Gq  and  nt *,  find  a  decoposition  that  minimizes  the  RL  such  that 
the  message  is  only  transferred  through  the  edges  of  G  and  such  that  nt[i)  <  nt* , 
i  =  1,2, ...,  M. 
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When  an  organization  has  fixed  structure  a  priori,  (here,  structure  simply  means 
which  subdivision  can  communicate  with  which.  )  and  the  task  allocator  should 
partition  the  global  task,  he  must  partition  it  in  a  way  that  the  necessary  message 
transfer  among  subtasks  should  be  allowed  only  those  pairs  of  subdivisions  that  can 
communicate  with  each  other.  Each  subdivision  has  its  own  capacity.  Consider 
a  situation  where  the  leader  of  this  organization  wants  to  allocate  the  tasks  with 
lowest  amount  of  message  transfers  possible.  The  formulation  in  this  section  mathe¬ 
matically  models  this  task  allocation  problem  of  an  organization.  We  can  also  apply 
this  problem  to  a  situation  of  a  computer  processor  network  on  which  distributed 
algorithm  is  running. 

This  problem  is  a  generalization  of  the  problem  solved  in  section  3.2.1.  sections. 
If  the  fixed  structure  of  this  problem  is  a  complete  graph,  this  problem  is  exactly 
equivalent  to  the  problem  solved  in  section  3.2.1.  We  can  design  a  polynomial-time 
algorithm  by  slightly  modifying  the  algorithm  introduced  in  the  previous  sections. 
Instead  of  running  ‘Primal-Dual  algorithm’  on  the  network  in  Figure  3-17,  we  can 
run  this  algorithm  on  a  modified  network  that  corresponds  to  the  fixed  organizational 
structure.  The  following  algorithm  shows  how  to  construct  such  a  network. 

Algorithm  3.5 
Phase  1 

1.  Create  \Eq\  nodes  corresponding  to  cross  terms  of  J.  Let  denote  the  node 
corresponding  to  x,xy. 

2.  Create  M  nodes  corresponding  to  processors.  Let  n,  denote  the  node  corre¬ 
sponding  to  DM,. 


3.  For  each  cross  term  x.x,  .  do 


If  ( DMi ,  DMj )  €  E,  make  an  arc  from  m,y  to  nt  )  and  make  an  arc  from 
rriij  to  rij.  Let  the  cost  of  these  two  axes  be  1. 

For  each  neighbor  of  DMi  in  G,  say  DMk  £  DMj, 

If  (DMk,  DMj)  €  E,  make  an  arc  from  to  ti*.  Let  the  cost  of 
this  arc  be  1. 

If  no  arc  is  made  for  XiXj,  terminate  with  an  output; 

“Organization  cannot  handle  this  task,” 

All  the  arcs  created  in  Step  3  have  infinite  capacity. 

4.  Create  the  sink  node  t  and  make  an  axe  from  each  nt-  to  t.  Each  arc  (nt,  t) 
created  at  Step  4  has  capacity  nt\,  and  cost  0. 

At  Step  3,  if  no  arc  is  made  for  some  cross  term  x,zy,  that  means  DM,  and  DMj 
are  not  within  tv/o  hops  from  each  other.  Therefore,  this  term  cannot  be  assigned 
to  any  processor.  (  Lemma  2-1  )  The  organization  cannot  handle  this  task. 

Phase  2 

Run  the  ‘Primal-Dual  algorithm’  as  in  the  previous  section. 


We  can  also  consider  muz,nt(t)  as  our  objective  while  setting  the  amount  of 
communication  defined  by  RL  as  a  constraint. 

Load  Balancing  with  Limited  Superposed  Links  in  a  Fixed  Structure 
Given  a  graph  G,  Gq ,  and  RL* ,  find  a  decomposition  that  minimizes  max ,  nt(i) 

.  uch  that  the  message  is  only  transferred  through  the  edges  of  G,  and  RL  <  RL’ . 

The  same  idea  used  for  ‘Load  Balancing  with  Limited  Superposed  Link’  can 
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be  used.  As  in  Algorithm  3.5,  which  is  for  ‘Minimal  Superposed  Link  in  a  Fixed 
Structure’,  a  graph  is  constructed  corresponding  to  the  problem  instance.  The  only 
difference  is  that  the  capacity  of  all  the  axes  of  the  type  (n^,t)  are  set  to  be  equal; 
initially  [  j^"|,  the  best  balance  possible.  At  each  iteration,  we  minimize  RL  using 
Phase  2  of  Algorithm  3.5.  If  we  obtain  RL  <  RL* ,  we  have  achieved  min  max,nt(t). 
Otherwise  increase  the  capacity  of  all  the  axes  of  the  type  (ni,f)  by  one. 

Algorithm  3.6 

1.  Set  dummy  =  [ 

2.  Run  Step  1  throug  Step  3  of  Phase  1  of  Algorithm  3.5 

3.  Create  the  sink  node  t  and  make  an  arc  from  each  n{  to  t.  Each  arc  ( ) 
created  at  Step  4  has  capacity  dummy ,  and  cost  0. 

4.  Do  while  dummy  <  \Eq\ 

Run  Phase  2  of  Algorithm  3.5 

If  RL  <  RL* ,  min  maxi  nt(i)  =  dummy;  terminate 

dummy  :=  dummy  +  1 

5.  Return  “Infeasible” 


3.2.3  Semi-flexible  organizational  structure 

hi  this  section  we  will  discuss  the  situation  where  the  communication  structure 
of  an  organization  is  fixed,  but  the  task  allocator  has  the  freedom  to  assign  to 
subdivisions  decision  variables  as  well  as  subtasks.  This  situation  is  best  explained 
by  the  following  mathematical  statement  of  the  formulation. 


Mapping  for  Minimal  Superposed  Link 
Given  Gq,  G  =  (V,  E),  and  nt\  for  i  —  1,2,  ..,M,  find  a  one-to-one  mapping, 

o  :  . ,*m}  — *  V 

and  a  decomposition  that  minimize  RL  such  that  nt(i)  <  nt J,  i  =  1,2 


Mapping  for  Load  Balancing  with  Limited  Superposed  Link 
Given  Gq,  G  =  (V,E),  and  RL* ,  find  a  one-to-one  mapping, 

&  •  {*1,*2, . ,*Af}  *  V 

and  a  decomposition  that  minimize  maxt-nt(t)  such  that  RL  <  RL*. 


The  one-to-one  mapping  a  mathematically  represents  the  assignment  of  de¬ 
cision  variables  to  subdivisions.  Each  variable,  z,-  represents  the  decision  which 
each  subdivision  is  delegated  to  make,  where  the  global  task  is  to  make  a  decision 
(z i ,  i2 ,  •••,  )  that  minimizes  the  cost  function  J. 

Here  is  the  recognition  version  of  these  two  problems 

Problem  1 

Given  Gq  —  (Vq,.Eq),  G  =  (V\  E),  and  nt ]  for  i  =  1,2 ,  ..,Af,  RL* , 
does  there  exist  a  one-to-one  mapping, 


O  {ll,X2, . ,*m} 
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and  a  decomposition  such  that 


nt(i)  <  ntj,  i  =  1,2,  ..,M, 


RL  <  RL*  ? 


Theorem  3-14 

The  recognition  version  of  two  problems  above  (Problem  1)  is  NP-complete. 

Proof 

Let  us  restrict  the  problem  by  making 

nt*  =  \Eq\  Vi¬ 


and 


RL *  =  \Eq\ 


This  restricted  version  is,  then,  equivalent  to  ‘subgraph  isomorphism’  [9]  problem. 
Now,  the  restricted  version  is 

Problem  2 

INSTANCE:  Gq  =  {VQt  Eq),  G  =  {V,E), 

PROBLEM  :  nt(t)  =  \Eq\  V*  ; 

does  there  exist  a  one-to-one  mapping, 

O  ’  {xi,x2, . ,xM}  — ♦  V 

and  a  decomposition  such  that 

nt(i)  <  nt J,  i  =  1,2 ,..,M  and  RL  <  \Eq\7 
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Subgraph  Isomorphism 


INSTANCE:  Gq  =  {Vq,Eq),  G  =  {V,E), 

PROBLEM  :  Is  Gq  a  subgraph  of  G 

Suppose  the  instance  of  Problem  2  is  ‘yes’.  For  any  one-to-one  mapping  a ,  the 
assignment  of  one  cross  term  increases  RL  either  by  1  or  2.  Since  we  have  |2Jq| 
cross  terms,  and  RL  <  \Eq\,  RL  must  be  increased  only  by  1  for  the  assignment  of 
each  cross  term  x^xy.  Therefore,  there  must  exist  a  link  between  the  node  cr(x*)  and 
cr(xj).  (Otherwise,  the  assignment  of  xtxy  increases  RL  by  2.)  Therefore,  Gq  can 
be  embedded  in  G. 

Suppose  the  instance  of  ‘Subgraph  Isomorphism’  is  ‘yes’.  Take  the  subgraph 
of  G.  which  is  isomorphic  to  Gq.  We  can  define  the  mapping  such  that  o(x{)  is 
the  node  of  G  corresponding  in  this  isomorphism  to  the  node  of  Gq  representing 

As  long  as  the  decomposition  is  concerned,  assign  the  cross  term  x,xy  either  to 
o (xj)  or  to  cr(xj).  This  way,  the  assignment  of  each  cross  term  ncreases  RL  only 
by  1.  Therefore,  RL  is  \Eq\  in  the  end.  Since  the  number  of  cross  terms  are  \Eq\, 
nt(i)  <  \Eq\  for  any  decomposition.  Thus,  the  instance  of  Problem  2  is  ‘yes’. 

We  have  shown  that  subgraph  isomorphism  problem  is  polynomially  trans¬ 
formed  to  Problem  2.  Subgraph  isomorphism  problem  is  known  to  be  NP-complete. 
It  is  obvious  that  Problem  2  is  in  NP.  Therefore,  Problem  2  is  NP-complete. 

Let  us  consider  a  special  case  of  our  problem;  namely, 

W  =  |r<j| 

If  we  restrict  the  general  graph  isomorphism  problem  to  a  special  case  where  | V I  = 
\Vq\,  this  special  graph  isomorphism  problem  is  still  NP-complete.  (  If  we  again 


85 


ik  i«  jl,  it.  it-'ii  it 


it.  t|.  At, 


restrict  to  an  instance  where  Gq  is  a  ring,  it  is  equivalent  to  Hamiltonian  circuit 
problem.  )  Therefore,  the  following,  special  case  of  our  problems  are  still  NP- 
complete. 

Problem  3 

INSTANCE:  GQ  =  ( VQ,EQ ),  G  =  (V,E)t  \V\  =  \VQ\ 

PROBLEM  :  nt( i)  =  |Eq|  Vt'  ; 

does  there  exist  a  one-to-one  mapping, 


o  :  {xltx2, . ,im}  — *  V 


and  a  decomposition  such  that 

nt(i)  <  ntf ,  t  =  1,2,  ..,M  and  RL  <  |Eq|? 


Problem  4 


INSTANCE:  Gq  =  (' Vq,Eq )  is  a  ring,  G  =  [V,E),\V\  =  \VQ\ 

PROBLEM  :  nt(t')  =  | Eq\  Vt  ;  does  there  exist  a  one-to-one  mapping, 


o  :  {xi,x2, . ,xM}  — ♦  V 


and  a  decomposition  such  that 

nt(i)  <  nt J,  t  =  1,2,  ..,M  and  RL  <  \Eq\7 


TO 


Figures  of  Chapter  3 


Example 


I 


J(xi ,  x2,  Z3,  x4,  z5)  =  10s*  +  lOz*  +  10^3  +  10z4  +  10xl  +  X1Z2  +  Z1Z4  +  2l^S  +  X3Z4 


T  =  (V,  2?r),  a  fixed  tree 


J"  has  a  cross  term  £1X4, -but  DM\  and  DM\  are  three  hops  apart. 


Figure  3-2  Illustration  for  Algorithm  3.3 
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At  time  ni 


(c) 


At  time  np 


\ 

J 


S  \  . 

\  OHp  H  _ 

^  y 
Tie  c  cp)  = 

/ 
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i 


Figure  3-3  (c)  (d)  Algorithm  3.3  for  Fixed  Tree 


processors  XiXvj  Cam.  be  assigned  b0 


(  &  )  l  <  CO  <_  j 


(b )  oj  <  i  <  j 


Figure  3-5  Locality  of  interaction 


At  the  beginning  of  step  t 


Figure  3-6  State  vector  U  [t) 
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Figure  3-7  Evolution  of  the  state  vector  U 
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Chapter  4.  INDIRECT  COMMUNICATION 


4.1  Introduction 

In  Chapter  3,  we  have  assumed  that  if  J*  depends  upon  x y,  there  must  exist  a 
link  between  DM,-  and  DMj .  In  this  chapter  we  relax  this  assumption.  We  assume 
that  as  long  as  there  is  an  undirected  path  between  DM,  and  DMj,  J*  can  have  Xj 
as  its  variable  and/or  J 3  can  have  xt-  as  its  variable  in  the  decomposition  of  J.  (A 
path  is  defined  to  be  a  sequence  of  edges  of  the  form,  (DMi,DM2),  (DM2,  DM3), 
(DM3, DM4),...,  ( DMi-i,DMi ).  Let  us  recapitulate  the  assumptions  under  which 
we  continue  our  discussion. 

Let  J  =  xtQx  be  the  global  cost  function  and  Gq  be  the  graph  that  represents 
J. 

GQ  =  (vq^eq) 

Vq  =  {1, 2, 

Eq  =  {  uncrdered  (t,i)|  x^xy  is  in  J} 

Assumption  1  :  If  J‘  depends  upon  xy,  there  is  a  path  between  DM,  and  DMj . 

(  ‘Indirect  Communication’  assumption  ) 

Assumption  2  :  Gq  is  connected,  and  diagonal  elements  of  Q  are  non-zero. 
Assumption  3  :  Each  square  term  x\  is  in  Jx  in  the  decomposition. 

Under  these  assumptions  the  amount  of  communication  (TL  or  RL )  and  the 
balance  of  load  (max,  nt(t)  )  are  to  be  optimized  in  the  decomposition  of  J.  It 
has  been  shown  in  chapter  2  that  if  TL  is  chosen  as  a  measure  of  the  amount  of 
communication,  the  optimization  becomes  trivial.  In  this  chapter  the  optimization 
of  RL  and  maxi  nt(i)  is  discussed. 


4.2  Flexible  organizational  structure 

Let  us  consider  the  cases  where  the  capacity  of  each  processor  is  fixed,  and  the 
load  on  the  processors  is  considered  balanced  as  long  as  load  of  each  processor  is 
within  its  capacity.  The  objective,  then,  is  to  minimize  the  total  amount  of  necessary 
communication. 

Minimal  Superposed  Link 

Given  J  =  xTQx,  ntf,  find  a  decomposition  that  minimizes  RL  such  that  nt(»)  < 
nt*,  t  =  1,2, ..., M. 

The  decomposition  of  a  global  cost  function  can  be  regarded  as  an  assignment 
of  all  the  cross  terms  of  the  global  cost  function  to  processors,  or  a  mapping  from 
the  set  of  cross  terms  to  a  set  of  processors.  If  a  cross  term  z,zy  is  to  be  assigned 
to  a  processor  where  k  #  t  and  k  ^  jf,  information  of  the  updated  value  of 

Xi  must  be  continually  sent  from  DM{  to  DMk ,  and  information  of  the  updated 
value  of  Xj  must  be  continually  sent  from  DMj  to  DM k-  Also,  updated  values  of 
partial  derivatives  of  the  subcost  function  Jk  must  be  continually  sent  from  DMk 
to  both  DMi  and  DMj.  There  must  be  communication  between  Z?Af,  and  DMk , 
and  also  DMj  and  DMk •  Since  the  organizational  structure  is  flexible,  and  the 
‘indirect  communication’  is  allowed,  the  task  allocator  has  a  freedom  to  choose  the 
communication  path  between  each  pair  of  processors  that  need  to  communicate  with 
each  other.  Let  the  chosen  paths  be  Pik  and  Pjk,  respectively.  For  this  assignment 
of  XiXj ,  RL  is  increased  by 

mt(xiXj,DMk,Pik,Pjk)  =  l[DMi,DMk,Pik)  +  l{DMj,DMk,Pjk) 
where  l(DM{,  DMk,  Pik)  is  the  length  of  the  path  P{k  from  DM,  to  DMk,  and 
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l(DMj,DMk,Pjk )  is  the  length  of  the  path  Py*  from  DMy  to  DM*.  (  mf  stands 
for  ‘message  transmission’.  ) 

If  a  cross  term  x^xy  is  to  be  assigned  to  DM,-  or  DMy,  there  must  be  a  commu¬ 
nication  between  DM,-  and  DMy.  Again,  the  task  allocator  has  a  freedom  to  choose 
this  communication  path.  Let  the  chosen  paths  be  P,-y.  RL  for  this  assignment  of 
x,xy  is,  then,  increased  by 


mt(xiXj,)  =  l(DMi,  DMj,Pij) 


Therefore,  given  a  decomposition  (  or  a  mapping  from  the  set  of  cross  terms  to 
a  set  of  processors  )  and  path  selection  between  each  pair  of  processors  that  need  to 
communicate  according  to  this  decomposition, 


PL=  ^  mtfaxj,,) 

(»j)€Gq 


The  problem  is  to  find  a  decomposition  and  a  path  selection  which  minimize  this 
RL. 


We  can  observe  that  for  a  cross  term  x,xy  assigned  to  DM*,  k  ^  i,  k  #  j ,  the 
best  path  between  DM,  and  DM*  is  a  direct  path,  and  so  is  the  path  between  DMy 
and  DM*.  Therefore, 

l(DMi,DMie,Pfk)  =  1 
'(DMy,DM*,P/*)  =  l 

where  Pfk  is  a  direct  path  from  DM*  to  DM*,  and  P?k  is  a  direct  path  from  DMy 
to  DM*  (  Pfk  =  (DMi,  DM*),  and  Pfk  =  (DMy,  DM*)  )  Therefore, 

mt(xiXj ,  DAfk,  Put,  Pjk)  =  2 
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For  a  cross  term  XiXj  assigned  to  DMi  or  DMj,  the  best  path  between  DM,  and 
DMj  is  again  the  direct  path. 

mt(x{Xj,  DMi,  Pfj)  =  1 

and 

mt{xiXjtDMj,Pfj )  =  1 

for  this  direct  path  Pf-  =  {DMi, DMj).  Therefore,  once  a  decomposition  is  de¬ 
termined,  the  selected  paths  ought  to  be  direct  ones.  Therefore,  the  problem  of 
selecting  communication  paths  is  embedded  in  the  problem  of  decomposition,  so 
‘Minimal  Superposed  Link’  can  be  rewritten  as 

minimize  Zk*i,k*j  2X<i*  +  £*=,  Xijk  +  £fc=y  xijk 
such  that 

Ejli  xiik  =  1  v(.',y)  e  eq 

Xijk  <  ntk. 

Xijk  =  o  for  (t,j)  Eq 

x^k  e  {o,  i} 

where 

Eq  =  {  unordered  pair  (t,  j)|  the  cross  term  ijZ;  is  in  J} 

This  is  exactly  the  same  as  linear  programming  formulation  of  ‘Minimal  Super¬ 
posed  Link’  problem  under  ‘Direct  Communication’  assumption  (  Chapter  3  ).  The 
conclusion  is  that  ‘Minimal  Superposed  Link’  problem  under  ‘Indirect  Communica¬ 
tion’  assumption  is  exactly  equivalent  to  the  one  under  the  ‘Direct  Communication’ 


assumption.  We  do  not  gain  anything  by  relaxing  this  constraint,  because  direct 
communications  are  the  best  in  order  to  have  small  RL. 


Load  Balancing  with  Limited  Superposed  Link 
Given  J  =  xTQx ,  RL* ,  find  a  decomposition  that  minimizes  max*  nt(i)  such  that 
RL  <  RL*. 

Notice  that  direct  communications  are  the  best  in  order  to  have  small  RL.  Since 
the  constraint  is  RL  <  RL* ,  we  can  find  an  answer  to  this  problem,  again,  by  solv¬ 
ing  ‘Load  Balancing  with  Limited  Superposed  Link’  under  ‘Direct  Communication’ 
assumption. 
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4.3  Fixed  organizational  structure  / 
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As  discussed  in  the  previous  section,  decomposing  a  global  cost  function  is 
exactly  equivalent  to  assigning  all  the  cross  terms  to  processors.  If  a  task  allocator 
or  algorithm  wants  to  assign  cross  term  x*xy  to  a  processor  DMk,  fc  /  *,  k  ^ 
j,  he  must  assure  the  communication  path  between  DM*  and  DM*  and  between 
DM j  and  DMk.  Unlike  the  case  of  ‘flexible  organizational  structure’,  choice  of  this 
communication  path  is  not  completely  free.  Since  the  organizational  structure  is 
fixed,  a  communication  route  between  a  pair  of  processors  must  be  chosen  from  the 
set  of  paths  between  this  corresponding  pair  of  nodes  in  the  graph  representing  the 
organizational  structure.  The  increment  of  RL  for  the  assignment  of  this  cross  term 
is  again  the  sum  of  length  of  these  two  routes.  (  If  x*x;  is  assigned  to  either  DM* 
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or  DMj ,  the  increment  of  RL  is  the  length  of  the  chosen  route  between  DMt  and 
DMj.)  Given  any  assignment  of  a  cross  term,  the  route  that  minimizes  the  increment 
of  RL  is  the  shortest  path  between  two  processors  that  need  to  communicate.  (  In 
case  of  a  flexible  organizational  structure,  the  shortest  path  was  always  the  direct 
path.  In  fact,  we  can  see  the  flexible  organizational  structure  as  a  special  case  of 
fixed  organizational  structure,  which  is  a  complete  graph.  ) 

Minimal  Message  Ambulation 
Given  a  graph  G ,  Gq  and  ntf,  find  a  decomposition  that  minimizes 

nd(DMi ,  DMk )  +  nd(DMj ,  DMk ) 

k  XiXj  ia  in  Jk 

such  that 

nt(i)  <  nt*i  i  =  1,2,  ..,M 

where  nd{DMi,DMm)  is  the  distance  of  the  shortest  path  between  DMi  and  DMm 
assuming  every  edge  has  a  distance  1.  (  nd{DMi,DMm )  =  0  if  /  =  m  )  ( nd  stands 
for  ‘nominal  distance’.) 

(We  can  cast  off  the  constraint  on  the  capacity  of  each  processor  by  letting  nt(i)  = 
oo,V*.  ) 

The  idea  to  solve  this  problem  is,  like  the  idea  of  Chapter  3,  to  transform 
the  problem  to  a  network  flow  problem.  A  network  associated  with  this  problem 
is  created  by  building  one  node,  mt;  associated  with  each  cross  term  XiXy,  and 
building  one  node,  nk  associated  with  each  agent  of  the  organization,  DMk •  Then, 
links  connecting  m’s  and  n’s  are  built  with  some  cost  of  the  link.  More  specifically, 
for  each  link  (m,y,n*),  let  the  cost  be  the  sum  of  the  shortest  path  between  DM, 
and  DMk  and  the  shortest  path  between  DMj  and  DMk-  The  solution  of  min-cost 
flow  problem  on  this  network  gives  the  solution  of  ‘Minimal  Message  Ambulation’. 


Algorithm  4.1 


Phase  1 

1.  Create  |£Jq|  nodes  corresponding  to  cross  terms  of  J.  Let  denote  the  node 
corresponding  to  x jXy. 

2.  Create  M  nodes  corresponding  to  processors.  Let  denote  the  node  corre¬ 
sponding  to  DM,. 

3.  For  each  pair  of  nodes  {mjy,nfc},  make  an  edge  (mij,rik) 

All  the  edges  created  in  Step  3  have  infinite  capacity. 

4.  Let  the  distance  of  every  edge  in  G  be  1. 

Compute  the  distance  of  the  shortest  path  between  each  pair  of  nodes  in  G. 

(  Let  us  use  sl(i,j)  to  denote  the  shortest  path  between  the  pair  {DMi,  DM, }.) 

5.  For  each  edge 


f  sl(i,j)  if  i  =  k  or  j  =  k 

\  sl(i,k)  sl(j,k)  if  i  ^  k  and  j  jL  k  ' 


6.  Create  the  sink  node  t  and  make  an  edge  from  each  n,  to  t.  Each  edge  (rii,t) 
created  at  Step  5  has  capacity  nf  J 


Phase  2 

Run  ‘Primal-Dual  algorithm’. 


This  algorithm  can  be  applied  to  a  more  general  version  of  a  problem,  where 
communication  overhead  is  defined  more  generally.  Let  each  link  of  G  have  its  own 
cost  of  delivering  message;  say,  the  cost  of  transmitting  through  (DM,,  DMj)  is 
c(i,j).  If  a  cross  term  xtx}  is  to  be  assigned  to  a  processor  DMfc,  where  k  /  t  and 
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k  ^  j,  there  must  be  communication  between  DM,  and  DMk,  and  also  DMj  and 
DM* •  Let  communication  paths  P,*  and  Py*  be  chosen.  For  this  assignment  of 
x,xy,  communication  overhead  is  increased  by 

Cost(xiXj,DMk,Pik,P}k)  =  cb’?)+  c(P/’9/) 

(p.?)€Pi* 

If  a  cross  term  x,xy  is  to  be  assigned  to  DMj  or  DMy ,  there  must  be  a  com¬ 
munication  between  DM,-  and  DMj.  Let  the  chosen  paths  be  pir  Communication 
overhead  is,  then,  increased  for  this  assignment  of  x^xy  by 

Cost(xiXj,,Pij)  =  ^2  C(P>?) 

(p.?)€P<fc 

Therefore,  given  a  decomposition  (  or  a  mapping  from  the  set  of  cross  terms  to 
a  set  of  processors  )  and  path  selection  between  each  pair  of  processors  that  need  to 
communicate  according  to  this  decomposition, 

Communication  Overhead  =  Cost(x,xy,  DM,  P,  ) 

Algorithm  4.1  can  be  modified  to  minimize  this  C ommunication  Overhead.  We 
only  modify  Step  4  of  Algorithm  4.1. 

Algorithm  4.2 
Phase  1 

1.  Create  |Dq|  nodes  corresponding  to  cross  terms  of  J.  Let  m,y  denote  the  node 

corresponding  to  x^xy. 

2.  Create  M  nodes  corresponding  to  processors.  Let  denote  the  node  corre¬ 
sponding  to  DM,. 

3.  For  each  pair  of  nodes  {m,y,n*},  make  an  edge  (m,y, n*). 

All  the  edges  created  in  Step  3  have  infinite  capacity. 


4.  The  distance  of  each  edge  in  G,  ( DMi,DM, )  is  c(i,j). 

Compute  the  distance  of  the  shortest  path  between  each  pair  of  nodes  in  G. 

(  Let  us  use  to  denote  the  shortest  path  between  the  pair  {DM{,  DM,}.) 

5.  For  each  edge  (m,;, n*), 

cost  =  {  ii  i  =  k  or  j  =  k 

\  sl(t,k)  +  sl(j,k)  if  »  /  k  and  j  ^  k 

6.  Create  the  sink  node  t  and  make  an  edge  from  each  n,  to  t.  Each  edge  ( ) 
created  at  Step  5  has  capacity  nt\. 


Phase  2 

Run  ‘Primal-Dual  algorithm’. 


We  can  also  consider  maijnt(z)  as  our  objective  while  setting  the  amount  of 
communication  defined  by  RL  as  a  constraint. 

Load  Balancing  with  Limited  Message  Ambulation 
Given  a  graph  G ,  Gq,  RL* ,  find  a  decomposition  that  minimizes  maXint(i )  such 
that 

£  £  nd{DMt,DMk)  +  nd{DM},DMk)  <  RL * 

fc  XiXj  in  Jk 

We  can  apply  the  same  idea  that  was  used  in  chapter  3  under  ‘Direct  Commu¬ 
nication’  assumption. 


Algorithm  4.3 


1.  Set  dummy  =  \ 

2.  Run  Step  1  throug  Step  5  of  Phase  1  of  Algorithm  4.2 

3.  Create  the  sink  node  t  and  make  an  arc  from  each  n,  to  t.  Each  arc  (n,-,  t) 
created  at  Step  4  has  capacity  dummy,  and  cost  0. 

4.  Do  while  dummy  <  \Eq\ 

Run  Phase  2  of  Algorithm  4.2 

If  RL  <  RL* ,  min  maxi  nt( t)  =  dummy,  terminate 

dummy  :=  dummy  +  1 

5.  Return  “Infeasible” 


4.4  Semi-flexible  organizational  structure 

Mapping  for  Minimal  Message  Ambulation 
Given  Gq,  G  =  (V,  E),  and  nt*{  for  i  =  1,2 find  a  one-to-one  mapping, 

o  :  {xi,x2, . ,xM }  — ^  V 

and  a  decomposition  that  minimize 

£  E  nd{o{xi),a{xk))  +  nd(o(x;),a(xfc)) 

k  x,  X,  in  Jk 

such  that  nt(t')  <  ntf,  i  =  1,2 

This  problem  is  very  similar  to  ‘Mapping  for  Minima!  Superposed  Link’  prob¬ 
lem.  The  only  difference  is  that  we  relax  the  constraint  of  ‘direct  communication’. 
Let  us  restate  ‘Mapping  for  Minimal  Superposed  Link’  problem. 


Mapping  for  Minimal  Superposed  Link 
Given  Gq,  G  =  [V, E),  and  nt*^  for  i  =  1,2,  ..,A/,  find  a  one-to-one  mapping, 

o  :  {xi,x2, . ,xM}  — ♦  V 

and  a  decomposition  that  minimize  RL  such  that  nt(i)  <  nt t  =  1,2 

‘Mapping  for  Minimal  Superposed  Link’  is  basically  a  combinatorial  optimiza¬ 
tion  problem.  A  feasible  domain  is  a  set  of  pairs,  (o,  decomposition)  such  that: 

1.  If  cross  term  x^xy  is  assigned  to  o(xk),  k  ^  i,k  j,  a  link  exist  in  G  between 
cr(xj)  and  o(xk)  and  between  o(xy)  and  a{xk)- 

2.  If  cross  term  x,xy  is  assigned  to  a[x{)  or  o-(xy),  a  link  exist  in  G  between  <7(1^) 
and  cr(xj). 

This  feasible  domain  is  defined  because  we  have  preassumed  that  Assumption 
l'  :  If  Jl  depends  upon  xy,  there  is  a  link  between  DM,  and  DMj. 

(  ‘Direct  Communication’  assumption  )  Assumption  2  :  Gq  is  connected, 
and  diagonal  elements  of  Q  are  non-zero.  Assumption  3  :  Each  square  term  x^  is 
in  J‘  in  the  decomposition. 

‘Mapping  for  Minimal  Message  Ambulation’  is  formulated  from  ‘Mapping  for 
Minimal  Superposed  Link’  by  relaxing  Assumption  1.  ‘Mapping  for  Minimal  Message 
Ambulation’  can  be  stated  as  the  following: 

Problem  4.1 

Given  Gq,  G  =  (V,E),  and  nt\  for  i  =  1,2,.., Af,  find  a  one-to-one  mapping, 


and  a  decomposition  that  minimize  RL  such  that  nt(i)  <  nt,*,  i  =  1,2 ,  ...,M  under 
the  following  assumptions: 

Assumption  1  :  If  J*  depends  upon  xy,  there  is  a  path  between  DM,  and  DMj. 

(  ‘Indirect  Communication’  assumption  )  Assumption  2  :  Gq  is  connected, 
and  diagonal  elements  of  Q  are  non-zero.  Assumption  3  :  Each  square  term  x\  is 
in  J*  in  the  decomposition. 


Therefore,  the  feasible  domain  of  this  combinatorial  problem  is  a  set  of  pairs 
[a,  decomposition)  such  that: 

1.  If  cross  term  x^xy  is  assigned  to  o(xfc),  k  ^  i,k  j,  a  path  exist  in  G  between 
cr(x,)  and  o(xfc)  and  between  cr(xj)  and  c{ik)- 

2.  If  cross  term  x^xy  is  assigned  to  <r(xt)  or  o{xj),  a  path  exist  in  G  between  o(ii) 
and  <r(xy). 

In  a  very  similar  manner  as  Problem  1  of  section  3.2.3  (  recognition  version 
of  ‘Mapping  for  Minimal  Superposed  Link’),  the  recognition  version  of  Problem  4.1 
turns  out  to  be  NP-complete. 

INSTANCE:  Gq,  G,  nt*,  for  i  =  1,2,  ..,M,  RL * 

PROBLEM  :  Does  there  exist  a  one-to-one  mapping  a  such  that 
RL  <  RL*  and  such  that 
nf(t')  <  nt*,  for  i  =  1,2,  ..,M? 

Proof 

If  we  restrict  this  problem  by  making 

nt(i)  =  |£q|  Vt 
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RL *  =  |£g| 

this  restricted  version  is,  again,  equivalent  to  ‘subgraph  isomorphism’  problem. 


Mapping  for  Load  Balancing  with  Limited  Message  Ambulation 
Given  Gq,  G  =  (V,E),  and  RL* ,  find  a  one-to-one  mapping, 

o  :  {*1,2:2, ,*m>  — *  V 

and  a  decomposition  that  minimize  maxiitt(i)  under  the  constraint, 

E  E  nd{o[xi)to{xk))  +nd(o{xj),o[xk))  <  RL * 

k  XiXj  is  in  Jk 

The  recognition  version  of  this  problem  is  identical  to  that  of  ‘Mapping  for 
Minimal  Message  Ambulation’. 


Since  these  problems  are  NP-complete,  one  type  of  approach  to  these  problem 
is  to  consider  special  cases  of  these  problems,  and  see  if  an  efficient  algorithm  can  be 
found  for  those  special  cases.  Let  us  consider  a  special  case  of  ‘Mapping  for  Minimal 
Message  Ambulation’,  where  the  graph  G  is  linear. 

G  =  (V,E) 

V  =  {DM1,DM2i....,DMm} 

E  =  {{DMi,DMi+l)\i  =  1,2 ,..,M  -  1} 

This  problem  models  a  situation  where  the  organization  has  a  linear  communication 
network,  and  the  task  allocator  wants  to  assign  jobs  to  agents  of  an  organization 


with  minimal  amount  of  necessary  communication.  It  turns  out  that  even  this  set 
of  special  instances  is  NP-complete.  We  can  restrict  this  set  of  instances  further  by 
setting 

nti  =  \Eq\  i  =  1, 2, M 

This  means  that  all  the  processors  have  enough  capacity,  so  the  balance  of  load 
needs  not  be  considered.  Now  let  us  look  at  this  doubly  restricted  set  of  instances. 
Let  a  be  an  arbitrary  one-to-one  mapping  from  Vq  to  V.  For  a  cross  term  x,x7  in 
J,  a  simple  path  between  <r(t)  and  a(j )  is  unique  because  G  is  linear.  Therefore,  as 
long  as  x+Xj  is  assigned  to  an  agent  on  this  path,  nd(a(i),a(xk))  +  nd(o(xy),  er(xfc)) 
is  the  length  of  this  path.  It  is  the  minimal  increase  of  RL  for  the  assignment  of 
X{Xj  for  this  mapping,  a.  Therefore,  the  recognition  version  of  the  set  of  instances 
being  discussed  is  as  follows: 

Given  Gq  —  ( Vq,Eq )  ,  G  =  (V, E)  that  is  linear,  and  RL* ,  does  there  exist  a 
one-to-one  mapping, 

o  :  Vq  — ►  V 

such  that 

nd(a(*)^(j')) 

(»,y)€^g 

Say  o (t )  =  DMi>  and  a(j)  =  DMj>.  Then, 

nd(c7(i),a(j))  =  jt'  -  j'\ 

Therefore,  this  formulation  is  equivalent  to  ‘Optimal  Linear  Arrangement’  problem 
[9],  which  is  NP-complete. 


Chapter  5.  SUMMARY  AND  EXTENSIONS 
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5.1  Summary  ' 

i 

l 

In  this  thesis,  the  task  allocation  scheme  for  an  organization  is  discussed.  The 
objectives  of  allocation  axe  reduction  of  individual  load,  speedy  performance,  and  ' 

t 

organizational  security.  A  decentralized  gradient-like  algorithm  for  an  additive  cost  | 

function  is  used  as  a  mathematical  model  for  the  behavior  of  an  organization.  In  this 
algorithm,  each  processor  DM,-  is  responsible  for  the  value  of  one  variable  £j,  and 
each  processor  DMj  has  the  subcost  function  J*.  With  each  processor  updating  its  I 

own  variable  and  communicating  information  with  other  processors,  the  algorithm 
achieves  the  goal  of  optimizing  J  =  £2,  J*>  which  is  the  sum  of  subcost  functions. 

Minimizing  a  global  cost  function  is  viewed  as  as  a  global  task  of  an  organization. 

Each  subcost  function  J*  is  associated  with  the  task  of  each  division  of  an  organi¬ 
zation.  Therefore,  decomposing  a  global  cost  function  represents  allocating  a  global 
task  to  divisions.  How  to  decompose  J  is  a  central  issue  of  this  thesis. 

In  a  decentralized  gradient-like  algorithm  for  an  additive  cost  function,  if  J ' 
depends  upon  xy,  there  must  be  a  communication  between  DMj  and  DMj.  The 
issue  of  decomposition  scheme  is  discussed  under  two  different  assumptions  con¬ 
cerning  communication  method  of  this  type  of  pair,  {DMi,  DMy}.  The  ‘Direct 
Communication’  assumption  mandates  that  there  must  be  a  direct  (  bidirectional 
)  communication  link  between  DMj  and  DMy.  Decomposition  of  J  under  this  as¬ 
sumption  is  discussed  is  Chapter  3.  The  ‘Indirect  communication’  assumption  allows 
communication  messages  between  DM,  and  DMj  to  be  relayed  by  other  processors. 

It  is  only  required  that  there  exist  an  (  undirected  )  communication  path  between 


DM{  and  DMj.  Decomposition  of  J  under  this  assumption  is  discussed  is  Chapter 
4.  In  both  chapters  the  objective  of  decomposition  is  to  balance  the  load  on  each 
processor,  and  to  keep  the  amount  of  communication  small. 


The  results  are  summarized  in  summary  charts  at  the  end  of  this  chapter. 

5.2  Extensions 

As  mentioned  in  Section  5.1,  if  Jl  depends  upon  Xj ,  there  must  be  a  commu¬ 
nication  capability  between  DMi  and  DMj.  Such  communication  may  take  place 
through  a  single  link  or  a  sequence  of  links  depending  upon  whether  we  have  the 
‘Direct  Communication’  assumption  or  the  ‘Indirect  Communication’  assumption. 
In  this  thesis  we  assumed  that  every  communication  link  is  a  bidirectional  link. 
If  unidirectional  links  are  used  instead  of  bidirectional  links,  problems  formulated 
in  this  thesis  are  modified.  Instead  of  the  undirected  graph  G  —  ( V,E )  used  in 
order  to  represent  the  communication  structure  in  this  thesis,  the  directed  graph 
Gd  =  ( V,Ed )  must  be  used.  Gd  having  a  directed  edge  (DMi,  DMj)  £  Ed  signifies 
that  the  information  of  x,  can  be  transmitted  from  DM,  to  DMj ,  and  the  informa¬ 
tion  of  A;  =  4rr-  can  be  transmitted  from  DM,  to  DM..  ‘Direct  Communication’ 
assumption  must  be  modified  to  the  following:  if  J3  depends  upon  x,,  there  must 
be  a  directed  link  (DM,,  DMj)  in  Gd.  ‘Indirect  Communication’  assumption  must 
be  modified  as  the  following:  if  J3  depends  upon  x ,,  there  must  be  a  directed  path 
from  DMi  to  DMj  in  Gd.  In  this  thesis  TL  and  RL  have  been  used  as  a  measure 
of  the  amount  of  communication.  RL  can  be  used  in  this  new  formulations.  How¬ 
ever,  TL  must  be  modified.  TL  was  defined  to  be  the  total  number  of  bidirectional 
links,  required  by  a  decomposition  (  in  case  of  flexible  organization  ),  or  built  in  an 
organization  (  in  case  of  fixed  or  semi-flexible  organization  ).  In  a  new  (  extended 


126 


)  formulation,  the  number  of  ‘unidirectional’  links  must  be  counted.  We  can  use 
TDL  (  Total  number  of  Directed  Links  )  to  denote  this  new  measure  of  the  amount 
of  communication.  Suppose  J 3  depends  upon  xt,  and  J*  depends  upon  x;  in  a  de¬ 
composition.  TL  is  counted  as  one  for  the  pair  {DMi,DMj},  but  TDL  is  counted 
as  two  for  this  pair,  because  a  unidirectional  link  ( DM^DMj )  and  a  unidirectional 
link  ( )  are  both  required. 

These  new  formulations  are  suggested  for  the  future  research. 
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Summary  chart  for  ‘Indirect  Communication’ 
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APPENDIX 


In  this  Appendix,  we  present  an  example  of  the  graph  that  does  not  have  an 
inverse  image  of  the  transformation  defined  in  Section  3.1.3. 

In  Chapter  3,  F(G),  a  transformation  of  the  graph  G  =  (V,E),  and  set  of 
graphs,  E  were  defined  as  the  following: 

F(G)  =  (V.  Ef) 

Ef  —  E  U  E added 

where 

Eadded  =  {{DM{,  DMj)  E\DMi  and  DMj  are  two  hops  from  each  other} 

E  =  {graph  H  \  3  a  graph  T  such  that  F(T)  =  H} 

Concerning  the  transformation  F,  it  has  the  following  property: 

Lemma  A.l 

Let  G i  —  (Vj ,  Ei),  and  G 2  —  (Vj, F2 ) .  Let  us  define  G j  U  Gj  =  (Vf  U  2 ,  E\  U  F2 ) ■ 
We  say  G\  and  G2  are  disjoint  if  fl  =  0. 

If  G  =  Gt  U  G2,  where  G\  and  G2  are  disjoint,  F(G)  =  F(G  1)  U  F(G2),  and  F(Gi) 
and  F(G2)  are  disjoint. 

Proof 

Let  G 1  —  (V'^Fi)  and  G2  =  (V^F^.  For  any  pair  {ui  G  Vl5v2  G  V2},  there  is  no 
path  between  them  in  G.  Suppose  (vj,v2)  €  F^j^.  Then,  and  r2  must  have 
been  two  hops  from  each  other.  Contradiction. 


Theorem  A.l  Necessary  condition  for  being  in  E 

If  a  undirected,  connected  graph  H  =  [Vh,Eh),  |Vj  >  3  is  in  the  set  E,  for  any  edge 
(u,v)  G  Eh,  there  exist  a  node  k  eVh  such  that  (u,fc)  G  Eh  and  (v.  A:)  G  Eh- 

Proof 

Since  He  E,  there  exist  a  graph  G  =  (V,  E)  such  that  F(G)  =  H. 

vh  =  v 

Eh  =  E  U  Eadded 

If  (u,v)  e  E acided,  u  and  v  must  be  two  hops  from  each  other  in  G.  Therefore,  there 
exists  a  node  k  G  Vh  such  that  ( u,k )  G  Eh  and  (t/,A:)  G  Eh.  Now,  let  us  consider 
the  case  where  (u,u)  G  Eh-  Since  H  is  connected,  from  Lemma  A.l,  G  is  connected, 
too.  Because  |  V1 1  >3,  there  must  be  a  node  k  G  V  that  is  a  neighbor  of  u  or  v  in  G. 
IS  k  is  a  neighbor  of  both,  done.  If  k  is  a  neighbor  of  only  one  of  these  two,  say,  u, 
(■ k,v )  must  be  in  Eadded.  Q.E.D. 


Any  graph  not  satisfying  this  necessary  condition  does  not  belong  to  the  set  E. 


The  graph  in  Figure  A-l  is  an  example. 
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